Nucleic acid corresponding to a gene of chromosome 22 involved in recurrent chromosomal translocations associated with the development of cancerous tumors, and nucleic acids of fusion resulting from said translocations

ABSTRACT

Translocations of chromosome 22 are associated with various cancers. Hybrid DNA sequences, having a portion of the Ews gene of chromosome 22 and a portion of either the Hum-Fli-1 gene of chromosome 11, the Erg gene of chromosome 21, or the Atf-1 gene of chromosome 12, are disclosed. Proteins encoded by these hybrid DNAs are disclosed. Diagnosis of specific cancers based on detection of the translocations is disclosed.

This application was filed under 35 U.S.C. 371 PCT/FR93/00494, filed May19, 1993.

The present invention relates to a nucleic acid including all or part ofthe nucleic sequence of a gene of chromosome 22 involved in therecurrent chromosomal translocations associated with the development ofcancerous tumors. The subject of the invention is also hybrid nucleicacids corresponding to the products of fusion resulting from thechromosomal translocations in which this gene of chromosome 22 isinvolved.

The invention relates more particularly to a nucleic acid including allor part of the nucleic sequence of the gene of chromosome 22 involved inthe recurrent chromosomal translocations t(11;22), t(21;22) andt(12;22), as well as the hybrid nucleic acids resulting from the fusionof this gene of chromosome 22 with the genes of chromosomes 11, 21 and12, respectively, which are involved in these translocations. Theinvention also relates to the mRNAs originating in the DNA of the geneof chromosome 22 and the fusions DNAs as well as the cDNAs that canderive from them, as well as the proteins for which they code.

The invention also relates to the detection of the gene of chromosome 22involved in the chromosomal translocations, as well as the fusion genesresulting from these translocations, with the aid of probes prepared onthe basis of precursor nucleic acids, with a view to diagnosing Ewing'ssarcoma and related tumors in subjects suffering from "small roundcell", or small cell, tumors or primitive peripheral neurectodermictumors, or malignant melanoma of the soft tissues; the invention alsorelates to the detection, for the same purposes, of products for whichthese genes code, and the products and reagents for implementing thesemethods.

Recurrent chromosomal translocations represent a mechanism of oncogeneactivation and hence are implicated in the appearance of numerouspathogenic tumors (E. Solomon, J. Barrow, A. D. Goddard, Science 254,1153-1160 (1991)). Among them, those known as small cell tumorsrepresent a group of heterogeneous cancers. Precise diagnosis of thesetumors is essential so that the therapeutic protocol to be used to treatthem can be chosen correctly. A subset of these tumors involves achromosomal translocation t(11;22)(q24;12) and is sensitive to the sametherapeutic protocol.

Ewing's sarcoma represents the most frequent secondary bone tumor in thechild. Despite the lack of a morphological marker, Ewing's sarcoma cellsoccasionally express antigens or can cause the expression ofcharacteristic morphological traits of neural differentiation (M.Lipinski, K. Braham, I. Philip et al., Cancer Res. 47, 183-187 (1987);A. O. Cavazzana, J. S. Miser, J. Jefferson, T. J. Triche, Am. J. Pathol.127, 507-518 (1987)). Thus techniques for diagnosing small cell tumorsuse the conventional methods of anatomopathology and cytology, and hencerepresent a diagnosis by exclusion, which is not entirely reliable.Demonstrating the presence of a chromosomal translocation t(11;22) in asmall cell tumor can be done by a karyotype study; this study requiresthe cells to arrive at the laboratory alive, which is often difficult toachieve and results in an elevated failure rate; furthermore, there aredifficulties in interpretation.

Ewing's sarcoma (ES) has been connected with a subtype of peripheralprimitive neurectodermic tumors (PNET) called peripheralneuroepithelioma (PN). This relationship is also supported by the highlyspecific expression of the MIC2 antigen both in Ewing's sarcoma (ES) andin peripheral neuroepithelioma (PN), while in the majority of otherhuman tumors this antigen is not expressed (I. M. Ambros, P. F. Ambros,S. Strehl et al., Cancer 67, 1886-1893 (1991); P. Garin-Chesa, E. J.Fellinger, A. G. Huvos et al., Am. J. Pathol. 139, 275-286 (1991)). Withother rare subtypes of primitive neurectodermic tumors (J. Whang-Peng,C. E. Freter, T. Knutson, J. J. Nanfro, A. Gazdar, Cancer Genet.Cytogenet. 29, 155-258 (1987); J. P. Chadarevian, M. Vekemans, T. A.Seemayer, N. Engl. J. Med. 311, 1702-1703 (1984); A. O. Cavazzana, S.Navarro, R. Noguera et al., Adv. Neuroblastoma Res. 2, 463-473 (1988);N. V. Vigfusson, L. J. Allen, J. H. Philip, T. Alschibaja, W. G. Riches,Cancer Genet. Cytogenet. 22, 211-218 (186)), Ewing's sarcoma andperipheral neuroepithelioma share a highly specific and cytogeneticallyidentical chromosomal translocation t(11;22)(q24;q12) (A. Aurias, C.Rimbaut, C. Buffe, J. Dubousset, A. Mazabraud, N. Engl. J. Med. 309,496-497 (1983); C. Turc-Carel, I. Philip, M. P. Berger, T. Philip, G. M.Lenoir, N. Engl. J. Med. 309, 497-498 (1983); J. Whang-Peng, T. J.Triche, T. Knutsen, J. Miser, E. C. Douglass, M. A. Israel, N. Engl. J.Med. 311, 584-585 (1984)). This translocation has been observed in 83%of cases of Ewing's sarcoma. More complex variants of this translocationor translocations have been observed in 9% of other cases andconsistently involve the 22q12 band (C. TurcCarel, A. Aurias, F.Mugneret et al., Cancer Genet. Cytogenet. 32, 229-238 (1988)). Theregion 22q12 thus appears to be one of the main sites where recurrentchromosomal alterations encountered in a defined group of human tumorsappear.

The research work done on the physical mapping of the long arm ofchromosome 22 (F. Zhang, O. Delattre, G. Rouleau, J. Couturier, D.Lefrancois, G. Thomas, A. Aurias, Genomics 6, 174-177 (1990); F. R.Zhang, A. Aurias, O. Delattre, M. H. Stern, J. Benitez, G. Rouleau, G.Thomas, Genomics 7, 319-324 (1990); O. Delattre, C. J. Azambuja, A.Aurias, J. Zucman, M. Peter, F. Zhang, M. C. Hors-Cayla, G. Rouleau, G.Thomas, Genomics 9, 721-727 (1991)), and isolation of probes inproximity with the chromosomal breakpoint (J. Zucman, O. Delattre, C.Desmaze, C. Azambuja, G. Rouleau, P. De Jong, A. Aurias, G. Thomas,Genomics, in press) have led the present inventors to determine that thetranslocation t(11;22)(q24;q12) fuses a gene carried on chromosome 22and a gene carried on chromosome 11; this fusion leads to the formationof a hybrid gene associated with the pathology. These two genes havebeen characterized, and it has been possible to define the products oftheir fusion.

The gene of chromosome 22 involved in the translocation t(11;22), calledEws, has been isolated and its cDNA has been sequenced; the sequence ofthe protein for which it codes was deduced from the cDNA sequence; theEws gene includes a region called EWSR1 of approximately 7 Kb at thelevel of which the breakpoint of chromosome 22 is located.

The gene of chromosome 11 involved in the translocation t(11;22), calledHum-Fli-1, has been isolated and its cDNA has been sequenced; thesequence of the protein for which it codes was deduced from the cDNAsequence; the Hum-Fli-1 gene includes a region called EWSR2 ofapproximately 40 Kb at the level of which the breakpoint of chromosome11 is located. Furthermore, the inventors have confirmed the stronghomology of the Hum-Fli-1 gene with the genes of the Ets family and moreparticularly with the murine Fli-1 gene (V. Baud et al., Genomics, Vo.11, 223-224, 1991; Y. Ben-David et al., Genes & Development, Vol. 5, No.6, 1991).

The inventors have also demonstrated that in approximately 12% of casesof Ewing's sarcoma, the Ews gene of chromosome 22 fuses with the Ergcarried on chromosome 21. This Erg gene is also a member of the familyof transcription factors Ets.

Furthermore, the research work done by the inventors on recurrentchromosomal translocations in which the Ews gene is involved has leadthem to determine that the recurrent chromosomal translocationt(12;22)(q13;12) associated with soft tissue malignant melanoma (STMM)fuses the Ews gene with the Atf-1 gene carried in chromosome 12. Thelink between the translocation t(12;22) and soft tissue malignantmelanoma has been described, (particularly by J. A. Bridge et al. (J. A.Bridge, D. A. Borek, J. R. Neff, M. Huntrakoon, Am. J. Clin. Pathol. 93,26-31, 1986) and G. Stenman et al. (G. Stenman, L.-G. Kindblom, L.Angervall, Genes Chrom. Cancer 4, 122-127, 1992).

Soft tissue malignant melanoma is a grave tumor that most often developsin the tendons and aponeuroses in subjects aged 15-35 (E. B. Chung, F.M. Enzinger, Am. J. Surg. Pathol. 7, 405-413, 1983; L. Epstein, A. O.Martin, R. Kempson, Cancer Res. 44, 1265-1274, 1984). Although STMMshares several common phenotype figures with cutaneous malignantmelanomas, the translocation t(12;22)(q13;12) is specific to this typeof tumor (F. Mitelman, Catalog of Chromosome Aberrations in Cancer, NewYork: Alan R. Liss, 1988). Because the breakpoint of chromosome 22 inthe translocation t(12;22)(q13;12) cannot be cytogeneticallydistinguished from that of the translocation t(11;22), the inventorshave studied the rearrangement of the Ews gene in the translocationt(12;22).

SUMMARY OF THE INVENTION

The invention consequently relates to a nucleic acid including all orpart of the nucleotide sequence of the Ews gene of chromosome 22,located at the level of the breakpoint of this chromosome in variousrecurrent chromosomal translocations associated with the development ofcancerous tumors.

The invention also relates to the hybrid DNAs relating from the fusionof the Ews gene with other genes at the time of recurrent chromosomaltranslocations involving chromosome 22, essentially constituted by partof the nucleotide sequence of the Ews gene and by part of the nucleotidesequence of the gene located at the level of the breakpoint of the otherchromosome involved in the translocation.

More particularly, the invention relates to the hybrid DNAs essentiallyincluding the part of the nucleotide sequence of the Ews gene up to the7 Kb region at the level of which the breakpoint of chromosome 22 islocated.

Among these hybrid DNAs including the part of the nucleotide sequence ofthe Ews gene from its origin up to the 7 Kb region at the level of whichthe breakpoint of chromosome 22 is located, the invention relates moreparticularly to those resulting from the following chromosomaltranslocations:

t(11;22)(q24;q12), associated with at least 80% of cases of Ewing'ssarcoma or related tumors;

t(11;22), associated with at least 10% of cases of Ewing's sarcoma orrelated tumors;

t(12;22), associated with soft tissue malignant melanoma.

The invention also relates to the mRNAs originating in the DNA of thegene of chromosome 22 and the fusion DNAs, as well as the cDNAs that canderive from them, as well as the proteins for which they code.

Accordingly, the subject of the invention is the hybrid DNAs resultingfrom the translocation t(11;22)(q24;q12) essentially constituted by thefusion of a part of the nucleotide sequence of the Ews gene and the partof the nucleotide sequence of the Hum-Fli-1 gene located at the level ofthe breakpoint of chromosome 11 in said translocation. More precisely,the invention relates to the hybrid DNAs including the part of thenucleotide sequence of the Hum-Fli-1 gene from the 40 Kb region EWSR2 atthe level of which the breakpoint of chromosome 11 is located in thistranslocation, up to its 3' end.

The inventors have studied in detail the mechanisms that give rise tothe various fusion genes of the translocation t(11;22).

The exon structures of the Ews and Hum-Fli-1 genes have been determined.It appears that the precise positions of the breakpoints located at thelevel of the regions EWSR1 and EWSR2 are most often located, and canexclusively be within, the introns of the two genes, so that by the setof splices that occur in the course of the maturation of the primarytranscript, an open reading frame can be restored. The position of theexons with respect to the restriction sites has been defined for the Ewsand Hum-Fli-1 genes.

The open reading frames are divided, and each intron between two codingexons interrupts the reading frames; the determination of the majorityof the intron-exon junctions has been done for both the Ews and theHum-Fli-1 genes.

Based on this work, it has been possible to deduce the approximate sizeof the introns, sites of the chromosomal breakpoints, and to contemplatea great number of possible fusion products. Furthermore, the promotersequence of the EWS gene has been determined.

Advantageously, the hybrid DNAs according to the invention correspondingto the fusion products relating from the recurrent chromosomaltranslocation t(11;22) are essentially constituted by a part of thenucleotide sequence of the cDNA of the Ews gene, and more precisely thenucleotide sequence of a cDNA resulting from the fusion of the Ews andHum-Fli-1 genes.

Nucleotide probes or their homologues, capable of hybridizing with allor part of the nucleotide sequence of the Ews or Hum-Fli-1 genes or withthe cDNA of one of these genes, have been prepared. Among them, probescapable of hybridizing specifically with a part of the Ews gene or apart of the Hum-Fli-1 gene have been selected.

Probes that are complementary to all or part of the hybrid DNAs, inparticular corresponding to the part of the Ews and Hum-Fli-1 genesaltered by the translocation t(11;22), or with the mRNA or cDNA of thefusion genes, have also been obtained with a view to detecting, byhybridization, the possible presence of a translocation t(11;22) in thetumor cells of a subject.

Synthetic oligonucleotides have been prepared from the nucleotidesequences of the Ews, Hum-Fli-1 genes and the products resulting fromthe fusion of these two genes, in order to prepare the cDNAcorresponding to the fusion zone by reverse transcription of mRNAoriginating in a specimen to be analyzed. In vitro gene amplification ofthis cDNA by PCR, with the aid of oligonucleotide primers, enables theanalysis of the amplified products by simple radioactive methods orcolorimetric methods of the gel electrophoresis and ethidium bromidecoloration type, or by immunological or fluorographic detection.

Consequently, the invention relates to the nucleotide sequences, ortheir analogs, that constitute genetic probes capable of hybridizingwith the nucleotide sequence of the Ews gene, or the hybrid DNAsresulting from the fusion of the Ews and Hum-Fli-1 genes, or their mRNAsand cDNAs, as well as the oligonucleotides originating in thesesequences and constituting primers for performing reverse RNAtranscription or the implementation of a PCR gene amplification process.

The chromosomal translocation t(11;22) observed in neurectodermic tumorsgives rise to hybrid fusion genes capable of coding for the chimeraproteins that have preserved the N-terminal part of the EWS proteincoded by the Ews gene and the C-terminal part of the HUM-FLI-1 proteincoded by the Hum-Fli-1 gene.

Thorough study of the Ews gene shows that it codes for a protein with656 amino acids with two domains of different structure. The C-terminalpart is characterized by three regions rich in glycine and arginineresidues and one region of 85 amino acid homologous to the RNA fixationconsensus domain; this strongly suggests that the C-terminal part of theEWS protein could interact with single-strand nucleic acids, and moreparticularly with RNA. The N-terminal portion of the EWS protein(NTD-EWS) includes a repeated and degenerated polypeptide that has aconsensus sequence SYGQQS which has weak homology with CTD-PolII.

The amino acid sequence of the chimera proteins can be deduced from thefusion cDNAs resulting from the translocation t(11;22); several of theseproteins have been produced in vitro, in order in particular to preparepolyclonal or monoclonal antibodies capable of fusing with the cellsthat produce these proteins, and consequently that exhibit thetranslocation t(11;22). The invention accordingly also relates to thechimera proteins resulting from the chromosomal translocation t(11;22),as well as the antibodies for immunological detection of the presence ofthese proteins, and more particularly a chimera protein, in a biologicalspecimen taken from a subject likely to carry a chromosomaltranslocation t(11;22).

The invention also relates to the methods of detecting a fusion generesulting from the chromosomal translocation t(11;22).

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 shows a map of the region EWSR1 on chromosome 22.

FIG. 2 shows the detection of abnormal genome fragments in six tumors.

FIG. 3 shows a restriction map of the regions involved in thechromosomal translocation t(11;22) (q23;q12) and two derivatives of thetranslocation.

FIG. 4 shows northern blot detection of abnormal transcripts in Ewing'ssarcoma or cell lines.

FIG. 5 shows reverse transcription and PCR detection of three types ofchimeric transcripts.

FIG. 6 shows the DNA sequence of the gene Ews (SEQ ID NO:1) and theamino acid sequence (SEQ ID NO:2) encoded by the DNA sequence.

FIG. 7 shows the DNA sequence of the gene Hum-Fli-1 (SEQ ID NO:3) andthe amino acid sequence (SEQ ID NO:4) encoded by the DNA sequence.

FIG. 8 shows the nucleotide sequence of the fusion cDNA of the Ews gene(SEQ ID NO:5) and the Hum-Fli-1 gene (SEQ ID NO:6).

FIG. 9A shows the various domains of the protein encoded by the Ewsgene.

FIG. 9B shows the most significant alignments of the protein EWS,including EWS (SEQ ID NO:7), Dpen P19 (SEQ ID NO:8), H nucl#4 (SEQ IDNO:9), HsnRNP Ul (SEQ ID NO:10), HPABP#3 (SEQ ID NO:11), HhnRNP A1#2(SEQ ID NO:12), and HhnRNP B1#2 (SEQ ID NO:13).

FIG. 10 shows a schematic representation of the proteins encoded by theEws gene and the Hum-Fli-1 gene and a fusion gene of Ews and Hum-Fli-1.

FIG. 11 shows a restriction map of the Ews and Hum-Fli-1 genes.

FIG. 12 shows the exon structure of the Ews gene, including Exon 1 (SEQID NO:14), Exon 2 (SEQ ID NO:15, SEQ ID NO:16), Exon 3 (SEQ ID NO:17,SEQ ID NO:18), Exon 4 (SEQ ID NO:19, SEQ ID NO:20), Exon 5 (SEQ IDNO:21, SEQ ID NO:22), Exon 6 (SEQ ID NO:23, SEQ ID NO:24), Exon 7 (SEQID NO:25, SEQ ID NO:26), Exon 8 (SEQ ID NO:27, SEQ ID NO:28), Exon 9(SEQ ID NO:29, SEQ ID NO:30), Exon 10, (SEQ ID NO:31, SEQ ID NO:32),Exon 11 (SEQ ID NO:33, SEQ ID NO:34), Exon 12 (SEQ ID NO:35, SEQ IDNO:36), Exon 13, (SEQ ID NO:37, SEQ ID NO:38), Exon 14 (SEQ ID NO:39,SEQ ID NO:40), Exon 15 (SEQ ID NO:41, SEQ ID NO:42), Exon 16 (SEQ IDNO:43, SEQ ID NO:44) and Exon 17 (SEQ ID NO:45).

FIG. 13 shows the exon structure of the Hum-Fli-1 gene, including Exon 1(SEQ ID NO:46), Exon 2 (SEQ ID NO:47, SEQ ID NO:48), Exon 3 (SEQ IDNO:49, SEQ ID NO:50), Exon 4 (SEQ ID NO:51, SEQ ID NO:52), Exon 5 (SEQID NO:53, SEQ ID NO:54), Exon 6 (SEQ ID NO:55, SEQ ID NO:56), Exon 7(SEQ ID NO:57, SEQ ID NO:58), Exon 8 (SEQ ID NO:59, SEQ ID NO:60), andExon 9 (SEQ ID NO:61, SEQ ID NO:62).

FIG. 14 shows exon juxtapositions of the Ews and Hum-Fli-1 genes and therespective junction sequences of the resultant fusion transcripts SEQ IDNO:63, SEQ ID NO:65, SEQ ID NO:67, SEQ ID NO:69, SEQ ID NO:71, SEQ IDNO:73, SEQ ID NO:75 and SEQ ID NO:77 with corresponding proteins encodedthereby SEQ ID NO:64, SEQ ID NO:66, SEQ ID NO:68, SEQ ID NO:70, SEQ IDNO:72, SEQ ID NO:74, SEQ ID NO:76 and SEQ ID NO:78.

FIG. 15 shows a juxtaposition of the Ews and Hum-Fli-1 genes, betweenwhich an unknown (alien) sequence is interposed (SEQ ID NO:79 and SEQ IDNO:80).

FIG. 16 shows four juxtapositions of the Ews and Hum-Fli-1 genes and theresultant junction sequences, SEQ ID NO:81/SEQ ID NO:82, SEQ IDNO:83/SEQ ID NO:84, SEQ ID NO:85/SEQ ID NO:86, SEQ ID NO:87/SEQ IDNO:88, SEQ ID NO:128/SEQ ID NO:129, SEQ ID NO:95/SEQ ID NO:96, SEQ IDNO:93/SEQ ID NO:94, and SEQ ID NO:91/SEQ ID NO:92, as respectivelyappear in the figure.

FIG. 17 shows the DNA sequence of the promoter region of the Ews gene(SEQ ID NO:97).

FIG. 18 shows exon juxtapositions of the Ews and ERG genes and thejunction sequences of the resultant fusion transcripts SEQ ID NO:98/SEQID NO:99, SEQ ID NO:100/SEQ ID NO:101, SEQ ID NO:102/SEQ ID NO:103, andSEQ ID NO:104/SEQ ID NO:105, as respectively appear in the figure.

FIG. 19 shows a restriction map of the EWSR1 region of the Ews gene andthe positions of the breakpoint in two tumor cell lines.

FIG. 20A shows Southern blot analysis of control DNA and from the STMMcell line SU-CCS-1.

FIG. 20B shows Northern blot analysis of RNA from the STMM cell lineSU-CCS-1 and from two control cell lines.

FIG. 20C shows analysis of amplified products from control RNA and fromthe STMM cell line SU-CCS-1.

FIG. 21 shows PCR-reverse transcriptase detection of chimerictranscripts on agarose gel.

FIG. 22 (SEQ ID NO:106) shows the DNA sequence of a fusion gene of Ewsand Atf-1 and the amino acid sequence (SEQ ID NO:107) encoded by theDNA.

FIG. 23 shows the partial sequence of the cDNA of the Ews and Atf-1genes (SEQ ID NO:111/SEQ ID NO:112, SEQ ID NO:113/SEQ ID NO:114, SEQ IDNO:115/SEQ ID NO:116) and of hybrid DNA of the Ews and Atf-1 genes (SEQID NO:108 and SEQ ID NO:109/SEQ ID NO:110) in the junction region.

FIG. 24 shows the translation products of the four cDNAs shown in FIG.23.

In a first embodiment, a method for detecting such a gene includes thefollowing steps:

the treatment of a biological specimen derived from tumor cells of apatient that are likely to have a chromosomal translocation t(11;22), insuch a way as to render the nucleic acids that it contains capable ofhybridizing with a probe;

putting at least one probe of the invention specific to either part ofthe nucleotide sequence of the Ews gene or the nucleotide sequence of afusion gene resulting from the translocation t(11;22) into contact withthe biological specimen, under conditions enabling the formation ofhybridization complexes between the probe or probes and the target DNAor RNA contained in the specimen;

the determination by any suitable means of the hybrids possibly formed.

This method may be employed in tests on a membrane or a slide or anyother suitable substrate, by the dot blot or Southern blot methods or byfiltration methods.

A second embodiment of such a method seeks to detect the transcript of afusion gene resulting from the translocation t(11;22); this methodconsists of performing reverse transcription with the aid of anappropriate synthetic oligonucleotide to obtain a corresponding cDNAfrom the mRNA extracted from a biological specimen taken from tumorcells of a patient likely to have a chromosomal translocation t(11;22);then amplifying this cDNA, with the aid of DNA polymerase andappropriate primers, by an enzymatic amplification process known as PCRthat consists of repeating the cycles of DNA denaturation, hybridizingthe primers and extending from the primers, a sufficient number of timesto increase the quantity of the starting sequence in an exponentialproportion relative to the number of cycles implemented. Theamplification products are analyzed, for example by electrophoresis, todetect the presence of a product corresponding to one or the other ofthe genes involved in the translocation t(11;22), or a fusion gene.Methods of detecting the amplified products by adsorption on microslidesare also possible.

The invention also relates to the detection of a chimera protein codedby a fusion gene resulting from the translocation t(11;22). Such amethod includes the following steps:

the treatment of a biological specimen deriving from a patent whosetumor cells are likely to have a translocation t(11;22), in such a wayas to render the proteins that it contains accessible to antibodies;

putting the biological specimen into contact with at least one antibodyof the invention specific to the one chimera protein of the inventioncorresponding to the translocation t(11;22), under conditions enablingthe formation of immunological complexes between the antibody orantibodies and the proteins present in the cells of the specimen;

the determination by any suitable means of the immunological complexespossibly formed.

Specifically, the detection of fusion DNA or RNA or of a chimera proteinenables diagnosis of Ewing's sarcoma and related tumors in subjects whohave small cell tumors.

Consequently, the subject of the invention is a method of diagnosingEwing's sarcoma and related tumors, consisting of detecting the presenceof a translocation t(11;22) in the tumor cells, from a biologicalspecimen derived from tumor cells of a patient that are likely to have achromosomal translocation t(11;22), the specimen to be treated in such away as to render the nucleic acids that it contains capable ofhybridizing with a probe. Such a method includes the following steps:

putting at least one probe of the invention, optionally labeled,specific to either part of the nucleotide sequence of the Ews gene orthe nucleotide sequence of a fusion gene resulting from thetranslocation t(11;22), into contact with the biological specimen,treated in such a way that the cells that it contains are lysed andoptionally that the nucleic acids contained in said cells are fragmentedwith the aid of restriction enzyme, under conditions enabling theformation of hybridization complexes between the probe or probes and thetarget DNA or RNA contained in the specimen;

the determination by any suitable means of the hybrids possibly formed,to detect the presence of a product corresponding to the fusion generesulting from the translocation t(11;22).

As before, this method may be employed in tests on a membrane or on aslide or any other suitable substrate by the dot blot or Southern blotor filtration methods.

In another embodiment, a method for diagnosing Ewing's sarcoma andrelated tumors consists of performing reverse transcription with the aidof an appropriate synthetic oligonucleotide to obtain a correspondingcDNA from the mRNA extracted from a biological specimen taken from tumorcells of a patient likely to have a chromosomal translocation t(11;22);then amplifying this cDNA, with the aid of DNA polymerase andappropriate primers, by an enzymatic amplification process known as PCRthat consists of repeating the cycles of DNA denaturation, hybridizingthe primers and extending from the primers, a sufficient number of timesto increase the quantity of the starting sequence in an exponentialproportion relative to the number of cycles implemented. Theamplification products are analyzed, for example by electrophoresis, todetect the presence of a product corresponding to a fusion gene of thetranslocation t(11;22). Methods of detecting the amplified products byadsorption on microslides are also possible.

In another embodiment, a method for diagnosing Ewing's sarcoma andrelated tumors consists of immunologically detecting the presence of atleast one protein coded by one or more fusion genes resulting from thetranslocation t(11;22), from a cell specimen put into contact with oneor more antibodies of the invention specific to fusion proteinscorresponding to the translocation t(11;22). Such a method includes thefollowing steps:

the treatment of a biological specimen deriving from a patent whosetumor cells are likely to have a translocation t(11;22), in such a wayas to render the proteins that it contains accessible to antibodies;

putting the biological specimen into contact with at least one antibodyof the invention directed against at least one chimera protein resultingfrom the translocation t(11;22), under conditions enabling the formationof immunological complexes between the antibody or antibodies and theproteins present in the cells of the specimen;

the determination by any suitable means of the immunological complexespossibly formed, to detect the presence of a product corresponding tothe fusion gene resulting from the translocation t(11;22).

Kits for implementing these methods can advantageously be prepared.These kits, in the case of a method by hybridization, contain probesaccording to the invention as well as control specimens of DNA or RNA.In the case of a method by reverse transcription and PCR, they containthe appropriate oligonucleotides for implementing each of thesetechniques as well as control specimens of DNA or RNA; in the case of animmunological method, they contain the monoclonal antibodies as well ascontrol specimens of known reactivity.

The invention also relates to the use of DNA sequences of the inventionto prepare anti-sense nucleotides, or analogs, having an anti-tumoractivity, which by hybridization with all or part of the fusion geneinhibit its transcription and thus prevent the production of mRNA andchimera proteins and/or in another embodiment, they hybridize with thetranscription mRNA and thus inhibit the production of chimera proteins.

Consequently, the invention relates to the use of a nucleic acid or ananalog of this nucleic acid, capable of hybridizing with the nucleotidesequence of a hybrid DNA of the invention, for preparing a therapeuticagent that inhibits the expression of a fusion gene resulting from achromosomal translocation t(11;22)(q24;q12), in tumor cells of patientssuffering from Ewing's sarcoma or related tumors.

Accordingly, the subject of the invention is also a therapeutic agentfor inhibiting the expression of a fusion gene resulting from achromosomal translocation t(11;22)(q24;q12), in tumor cells of patientssuffering from Ewing's sarcoma or related tumors, characterized in thatit is essentially constituted by a hybrid DNA of the invention, or ananalog of this DNA, capable of hybridizing with the nucleotide sequenceof a fusion gene resulting from the chromosomal translocationt(11;22)(q24;q12).

In another embodiment, the invention relates to the use of a nucleicacid or an analog of this nucleic acid, capable of hybridizing with thenucleotide sequence of a hybrid RNA of the invention, for preparing atherapeutic agent that inhibits the translation of chimera proteinsresulting from a chromosomal translocation t(11;22)(q24;q12), in tumorcells of patients suffering from Ewing's sarcoma or related tumors.

Hence the subject of the invention is also a therapeutic agent forinhibiting the translation of chimera proteins resulting from achromosomal translocation t(11;22)(q24;q12), in tumor cells of patientssuffering from Ewing's sarcoma or related tumors, characterized in thatit is essentially constituted by a hybrid RNA of the invention, or ananalog of this RNA, capable of hybridizing with the nucleotide sequenceof the RNA originating in a fusion gene resulting from the chromosomaltranslocation t(11;22)(q24;q12).

The invention also has as its subject the hybrid DNAs resulting from thetranslocation t(21;22) associated with approximately 10% of cases ofEwing's sarcoma or related tumors, essentially constituted by the fusionof a part of the nucleotide sequence of the Ews gene, and of the part ofthe nucleotide sequence of the Erg gene located at the level of thebreakpoint of chromosome 21 in the translocation. More precisely, theinvention relates to the hybrid DNAs including the part of thenucleotide sequence of the Erg gene from the region at the level ofwhich the breakpoint of chromosome 21 is located in this translocation,to its end 3'.

The inventors have studied in detail the mechanisms giving rise to thevarious fusion genes of the translocation t(21;22).

Advantageously, the hybrid DNAs according to the invention correspondingto the fusion products relating from the recurrent chromosomaltranslocation t(21;22) are essentially constituted by a part of thenucleotide sequence of the cDNA of the Ews gene, and more precisely thenucleotide sequence of a cDNA resulting from the fusion of the Ews andErg genes.

Nucleotide probes or their homologues, capable of hybridizing with allor part of the nucleotide sequence of the Ews or Erg genes or with thecDNA of one of these genes, had been prepared. Among them, probescapable of hybridizing specifically with a part of the Ews gene or apart of the Erg gene have been selected.

Probes that are complementary to all or part of the hybrid DNAs, inparticular corresponding to the part of the Ews and Erg genes altered bythe translocation t(21;22), or with the mRNA or cDNA of the fusiongenes, have also been obtained with a view to detecting, byhybridization, the possible presence of a translocation t(21;22) in thetumor cells of a subject.

Synthetic oligonucleotides have been prepared from the nucleotidesequences of the Ews, Erg genes and the products resulting from thefusion of these two genes, in order to prepare the cDNA corresponding tothe fusion zone by reverse transcription of mRNA originating in aspecimen to be analyzed. In vitro gene amplification of this cDNA byPCR, with the aid of oligonucleotide primers, enables the analysis ofthe amplified products by simple radioactive methods or colorimetricmethods of the gel electrophoresis and ethidium bromide coloration type,or by immunological or fluorographic detection.

Consequently, the invention relates to the nucleotide sequences, ortheir analogs, that constitute genetic probes capable of hybridizingwith the nucleotide sequence of the Ews gene, or the hybrid DNAsresulting from the fusion of the Ews and Erg genes, or their mRNAs andcDNAs, as well as the oligonucleotides originating in these sequencesand constituting primers for performing reverse RNA transcription or theimplementation of a PCR gene amplification process.

The chromosomal translocation t(21;22) observed in neurectodermic tumorsgives rise to hybrid fusion genes capable of coding for the chimeraproteins that have preserved the N-terminal part of the EWS proteincoded by the Ews gene and the C-terminal part of the ERG protein codedby the Erg gene.

As in the case of the chimera proteins resulting from the fusion of theEws and Hum-Fli-1 genes, the chimera proteins resulting from thetranslocation t(21;22) reserve the NTD-EWS domain, which is in phasewith the DNA fixation domain of the protein ERG.

The amino acid sequence of the chimera proteins can be deduced from thefusion cDNAs resulting from the translocation t(21;22); certain ones ofthese proteins have been produced in vitro in order in particular toprepare polyclonal and monoclonal antibodies capable of fusing with thecells that produce these proteins, and consequently that have thetranslocation t(21;22). Accordingly, the invention also relates to thechimera proteins resulting from the chromosomal translocation t(21;22),as well as the antibodies for the immunological detection of thepresence of these proteins, and more particularly of a chimera proteinin a biological specimen taken from a subject likely to carry achromosomal translocation t(21;22). The invention also relates to themethods of detecting a fusion gene resulting from the chromosomaltranslocation t(21;22).

In a first embodiment, a method for detecting such a gene includes thefollowing steps:

the treatment of a biological specimen derived from tumor cells of apatient that are likely to have a chromosomal translocation t(21;22), inwhich a way as to render the nucleic acids that it contains capable ofhybridizing with a probe;

putting at least one probe of the invention, specific to either part ofthe nucleotide sequence of the Ews gene or the nucleotide sequence of afusion gene corresponding to the translocation t(21;22), into contactwith the biological specimen, under conditions enabling the formation ofhybridization complexes between the probe or probes and the target DNAor RNA contained in the specimen;

the determination by any suitable means of the hybrids possibly formed.

This method may be employed in tests on a membrane or a slide or anyother suitable substrate, by the dot blot or Southern blot methods or byfiltration methods.

A second embodiment of such a method seeks to detect the transcript of afusion gene resulting from the translocation t(21;22); this methodconsists of performing reverse transcription with the aid of anappropriate synthetic oligonucleotide to obtain a corresponding cDNA,from the mRNA extracted from a biological specimen taken from tumorcells of a patient likely to have a chromosomal translocation t(21;22);then amplifying this cDNA, with the aid of DNA polymerase andappropriate primers, by an enzymatic amplification process known as PCRthat consists of repeating the cycles of DNA denaturation, hybridizingthe primers and extending from the primers, a sufficient number of timesto increase the quantity of the starting sequence in an exponentialproportion relative to the number of cycles implemented. Theamplification products are analyzed, for example by electrophoresis, todetect the presence of a product corresponding to one or the other ofthe genes involved in the translocation t(21;22), or a fusion gene.Methods of detecting the amplified products by adsorption on microslidesare also possible.

The invention also relates to the detection of a chimera protein codedby a fusion gene resulting from the translocation t(21;22). Such amethod includes the following steps:

the treatment of a biological specimen deriving from a patent whosetumor cells are likely to have a translocation t(21;22), in such a wayas to render the proteins that it contains accessible to antibodies;

putting the biological specimen into contact with at least one antibodyspecific to the chimera protein of the invention corresponding to thetranslocation t(21;22), under conditions enabling the formation ofimmunological complexes between the antibody or antibodies and theproteins present in the cells of the specimen;

the determination by any suitable means of the immunological complexespossibly formed.

Specifically, the detection of fusion DNA or RNA resulting from thetranslocation t(21;22) or of a corresponding chimera protein enables thediagnosis of Ewing's sarcoma and related tumors in subjects who havesmall cell tumors.

Consequently, the subject of the invention is a method for diagnosingEwing's sarcoma and related tumors, consisting of detecting the presenceof a translocation t(21;22) in the tumor cells, from a biologicalspecimen derived from tumor cells of a patient that are likely to have achromosomal translocation t(21;22), the specimen to be treated in such away as to render the nucleic acids that it contains capable ofhybridizing with a probe. Such a method includes the following steps:

putting at least one probe of the invention, optionally labeled,specific to either part of the nucleotide sequence of the Ews gene orthe nucleotide sequence of a fusion gene resulting from thetranslocation t(21;22), into contact with the biological specimen,treated in such a way that the cells it contains are lysed andoptionally that the nucleic acids contained in said cells are fragmentedwith the aid of restriction enzyme, under conditions enabling theformation of hybridization complexes between the probe or probes and thetarget DNA or RNA contained in the specimen;

the determination by any suitable means of the hybrids possibly formed,to detect the presence of a product corresponding to the fusion generesulting from the translocation t(21;22).

This method may be employed in tests on a membrane or a slide or anyother suitable substrate, by the dot blot or Southern blot methods or byfiltration methods.

In another embodiment, a method for diagnosing Ewing's sarcoma andrelated tumors consists of performing reverse transcription with the aidof an appropriate synthetic oligonucleotide to obtain a correspondingcDNA from the mRNA extracted from a biological specimen taken from tumorcells of a patient likely to have a chromosomal translocation t(21;22);then amplifying this cDNA, with the aid of DNA polymerase andappropriate primers, by an enzymatic amplification process known as PCRthat consists of repeating the cycles of DNA denaturation, hybridizingthe primers and extending from the primers, a sufficient number of timesto increase the quantity of the starting sequence in an exponentialproportion relative to the number of cycles implemented. Theamplification products are analyzed, for example by electrophoresis, todetect the presence of a product corresponding to a fusion gene of thetranslocation t(21;22). Methods of detecting the amplified products byadsorption on microslides are also possible.

In another embodiment, a method for diagnosing Ewing's sarcoma andrelated tumors consists of immunologically detecting, from a cellspecimen put into contact with one or more antibodies of the inventionthat are specific to fusion proteins, the presence of at least oneprotein coded by one or more fusion genes resulting from thetranslocation t(21;22). Such a method includes the following steps:

the treatment of a biological specimen deriving from a patent whosetumor cells are likely to have a translocation t(21;22), in such a wayas to render the proteins that it contains accessible to monoclonalantibodies;

putting the biological specimen into contact with at least one antibodyof the invention, directed against at least one chimera proteincorresponding to the translocation t(21;22), under conditions enablingthe formation of immunological complexes between the antibody orantibodies and the proteins present in the cells of the specimen;

the determination by any suitable means of the immunological complexespossibly formed, to detect the presence of a product corresponding tothe fusion gene resulting from the translocation t(21;22).

Since translocation t(21;22) is associated with approximately 10% ofcases of Ewing's sarcoma, and translocation t(11;22) is associated withat least 80% of the cases of Ewing's sarcoma, it is advantageous, toenable diagnosis of Ewing's sarcoma, to simultaneously employ themethods that seek to detect the products resulting from these twotranslocations in the tumor cells of patients likely to exhibit thesechromosomal translocations.

Kits for implementing these methods can advantageously be prepared.These kits, in the case of a method by hybridization, contain probesaccording to the invention as well as control specimens of DNA or RNA.In the case of a method by reverse transcription and PCR, they containthe appropriate oligonucleotides for implementing each of thesetechniques as well as control specimens of DNA or RNA; in the case of animmunological method, they contain the monoclonal antibodies as well ascontrol specimens of known reactivity.

The invention also relates to the use of DNA sequences of the inventionto prepare anti-sense nucleotides, or analogs, having an anti-tumoractivity, which by hybridization with all or part of the fusion geneinhibit its transcription and thus prevent the production of mRNA andchimera proteins and/or in another embodiment, they hybridize with thetranscription mRNA and thus inhibit the production of chimera proteins.

The invention consequently relates to the use of a nucleic acid or ananalog of this nucleic acid, capable of hybridizing with the nucleotidesequence of a hybrid DNA corresponding to the translocation t(21;22) ofthe invention, for preparing a therapeutic agent that inhibits theexpression of a fusion gene resulting from said chromosomaltranslocation, in tumor cells of patients suffering from Ewing's sarcomaor related tumors.

Accordingly the subject of the invention is also a therapeutic agent forinhibiting the expression of a fusion gene resulting from thechromosomal translocation t(21;22) in tumor cells of patients sufferingfrom Ewing's sarcoma or related tumors, characterized in that it isessentially constituted by a hybrid DNA of the invention correspondingto said translocation, or an analog of this DNA, capable of hybridizingwith the nucleotide sequence of a fusion gene resulting from thechromosomal translocation t(21;22).

In another embodiment, the invention relates to the use of a nucleicacid or an analog of this nucleic acid, capable of hybridizing with thenucleotide sequence of a hybrid RNA according to the inventioncorresponding to the translocation t(21;22), for preparing a therapeuticagent that inhibits the translation of chimera proteins resulting fromsuch a chromosomal translocation, in tumor cells of patients sufferingfrom Ewing's sarcoma or related tumors.

Hence, the subject of the invention is also a therapeutic agent forinhibiting the translation of chimera proteins resulting from achromosomal translocation t(21;22), in tumor cells of patients sufferingfrom Ewing's sarcoma or related tumors, characterized in that it isessentially constituted by a hybrid RNA of the invention correspondingto the translocation t(21;22), or an analog of this RNA, capable ofhybridizing with the nucleotide sequence of the RNA originating in afusion gene resulting from this chromosomal translocation.

Since the translocation t(21;22) is associated with approximately 10% ofcases of Ewing's sarcoma, and the translocation t(11;22) is associatedwith at least 80% of the cases of Ewing's sarcoma, the inventionadvantageously relates to a therapeutic agent, essentially constitutedby either therapeutic agents that inhibit the expression of the fusiongenes resulting from these two chromosomal translocations, ortherapeutic agents that inhibit the translation of the fusion genesresulting from these two chromosomal translocations.

Finally, the subject of the invention is the hybrid DNAs resulting fromthe translocation t(12;22)(q13;q12), essentially constituted by thefusion of a part of the nucleotide sequence of the gene Ews and the partof the nucleotide sequence of the Atf-1 gene located at the level of thebreakpoint of chromosome 12 in this translocation. More precisely, theinvention relates to the hybrid DNAs including the part of thenucleotide sequence of the Atf-1 gene from the region at the level ofwhich the breakpoint of chromosome 12 is located in this translocation,up to its 3' end.

The inventors have studied in detail the mechanisms giving rise to thevarious fusion genes of the translocation t(11;22).

Advantageously, the hybrid DNAs according to the invention correspondingto the fusion products relating from the recurrent chromosomaltranslocation t(12;22) are essentially constituted by a part of thenucleotide sequence of the cDNA of the Ews gene, and more precisely thenucleotide sequence of a cDNA resulting from the fusion of the Ews andAtf-1 genes.

Nucleotide probes or their homologues, capable of hybridizing with allor part of the nucleotide sequence of the Ews or Atf-1 genes or with thecDNA of one of these genes, had been prepared. Among them, probescapable of hybridizing specifically with a part of the Ews gene or apart of the Atf-1 gene have been selected.

Probes that are complementary to all or part of the hybrid DNAs, inparticular corresponding to the part of the Ews and Atf-1 genes alteredby the translocation t(12;22), or with the mRNA or cDNA of the fusiongenes, have also been obtained with a view to detecting, byhybridization, the possible presence of a translocation t(12;22) in thetumor cells of a subject.

Synthetic oligonucleotides have been prepared from the nucleotidesequences of the Ews, Atf-1 genes and the products resulting from thefusion of these two genes, in order to prepare the cDNA corresponding tothe fusion zone by reverse transcription of mRNA originating in aspecimen to be analyzed. In vitro gene amplification of this cDNA byPCR, with the aid of oligonucleotide primers, enables the analysis ofthe amplified products by simple radioactive methods or colorimetricmethods of the gel electrophoresis and ethidium bromide coloration type,or by immunological or fluorographic detection.

Consequently, the invention relates to the nucleotide sequences, ortheir analogs, that constitute genetic probes capable of hybridizingwith the nucleotide sequence of the Ews gene, or the hybrid DNAsresulting from the fusion of the Ews and Atf-1 genes, or their mRNAs andcDNAs, as well as the oligonucleotides originating in these sequencesand constituting primers for performing reverse RNA transcription or theimplementation of a PCR gene amplification process.

The chromosomal translocation t(12;22) observed in soft tissue malignantmelanoma (STMM) tumors gives rise to hybrid fusion genes capable ofcoding for the chimera proteins that have preserved the N-terminal partof the EWS protein coded by the Ews gene and the C-terminal part of theATF-1 protein coded by the Atf-1 gene.

The amino acid sequence of the chimera proteins can be deduced from thefusion cDNAs resulting from the translocation t(12;22); one of theseproteins has been produced in vitro, in order in particular to preparepolyclonal or monoclonal antibodies capable of fusing with the cellsthat produce this protein, and consequently that exhibit thetranslocation t(12;22). The invention accordingly also relates to thechimera proteins resulting from the chromosomal translocation t(12;22),as well as the antibodies for immunological detection of the presence ofthese proteins, and more particularly a chimera protein, in a biologicalspecimen taken from a subject likely to carry a chromosomaltranslocation t(12;22).

The invention also relates to the methods of detecting a fusion generesulting from the chromosomal translocation t(12;22).

In a first embodiment, a method for detecting such a gene includes thefollowing steps:

the treatment of a biological specimen derived from tumor cells of apatient that are likely to have a chromosomal translocation t(12;22), insuch a way as to render the nucleic acids that it contains capable ofhybridizing with a probe;

putting at least one probe of the invention specific to either part ofthe nucleotide sequence of the Ews gene or the nucleotide sequence of afusion gene resulting from the translocation t(12;22), or a mixture ofthese probes, into contact with the biological specimen, underconditions enabling the formation of hybridization complexes between theprobe or probes and the target DNA or RNA contained in the specimen;

the determination by any suitable means of the hybrids possibly formed.

This method may be employed in tests on a membrane or a slide or anyother suitable substrate, by the dot blot or Southern blot methods or byfiltration methods.

A second embodiment of such a method seeks to detect the transcript of afusion gene resulting from the translocation t(12;22); this methodconsists of performing reverse transcription with the aid of anappropriate synthetic oligonucleotide to obtain a corresponding cDNAfrom the mRNA extracted from a biological specimen taken from tumorcells of a patient likely to have a chromosomal translocation t(12;22);then amplifying this cDNA, with the aid of DNA polymerase andappropriate primers, by an enzymatic amplification process known as PCRthat consists of repeating the cycles of DNA denaturation, hybridizingthe primers and extending from the primers, a sufficient number of timesto increase the quantity of the starting sequence in an exponentialproportion relative to the number of cycles implemented. Theamplification products are analyzed, for example by electrophoresis, todetect the presence of a product corresponding to one or the other ofthe genes involved in the translocation t(12;22), or a fusion gene.Methods of detecting the amplified products by adsorption on microslidesare also possible.

The invention also relates to the detection of a chimera protein codedby a fusion gene resulting from the translocation t(12;22). Such amethod includes the following steps:

the treatment of a biological specimen deriving from a patent whosetumor cells are likely to have a translocation t(12;22), in such a wayas to render the proteins that it contains accessible to antibodies;

putting the biological specimen into contact with at least one antibodyof the invention specific to the one chimera protein of the inventioncorresponding to the translocation t(12;22), under conditions enablingthe formation of immunological complexes between the antibody orantibodies and the proteins present in the cells of the specimen;

the determination by any suitable means of the immunological complexespossibly formed.

Specifically, the detection of fusion DNA or RNA or of a chimera proteinenables diagnosis of STMM.

Consequently, the subject of the invention is a method of diagnosingSTMM, consisting of detecting the presence of a translocation t(12;22)in the tumor cells, from a biological specimen derived from tumor cellsof a patient that are likely to have a chromosomal translocationt(12;22), the specimen to be treated in such a way as to render thenucleic acids that it contains capable of hybridizing with a probe. Sucha method includes the following steps:

putting at least one probe of the invention, optionally labeled,specific to either part of the nucleotide sequence of the Ews gene orthe nucleotide sequence of a fusion gene resulting from thetranslocation t(12;22), into contact with the biological specimen,treated in such a way that the cells that it contains are lysed andoptionally that the nucleic acids contained in said cells are fragmentedwith the aid of restriction enzyme, under conditions enabling theformation of hybridization complexes between the probe or probes and thetarget DNA or RNA contained in the specimen;

the determination by any suitable means of the hybrids possibly formed,to detect the presence of a product corresponding to the fusion generesulting from the translocation t(12;22).

As before, this method may be employed in tests on a membrane or on aslide or any other suitable substrate by the dot blot or Southern blotor filtration methods.

In another embodiment, a method for diagnosing STMM consists ofperforming reverse transcription with the aid of an appropriatesynthetic oligonucleotide to obtain a corresponding cDNA from the mRNAextracted from a biological specimen taken from tumor cells of a patientlikely to have a chromosomal translocation t(12;22); then amplifyingthis cDNA, with the aid of DNA polymerase and appropriate primers, by anenzymatic amplification process known as PCR that consists of repeatingthe cycles of DNA denaturation, hybridizing the primers and extendingfrom the primers, a sufficient number of times to increase the quantityof the starting sequence in an exponential proportion relative to thenumber of cycles implemented. The amplification products are analyzed,for example by electrophoresis, to detect the presence of a productcorresponding to a fusion gene of the translocation t(11;22). Methods ofdetecting the amplified products by adsorption on microslides are alsopossible.

In another embodiment, a method for diagnosing STMM consists ofimmunologically detecting the presence of at least one protein coded byone or more fusion genes resulting from the translocation t(12;22), froma cell specimen put into contact with one or more antibodies of theinvention specific to fusion proteins. Such a method includes thefollowing steps:

the treatment of a biological specimen deriving from a patent whosetumor cells are likely to have a translocation t(12;22), in such a wayas to render the proteins that it contains accessible to antibodies;

putting the biological specimen into contact with at least one antibodyof the invention directed against at least one chimera protein resultingfrom the translocation t(12;22), under conditions enabling the formationof immunological complexes between the antibody or antibodies and theproteins present in the cells of the specimen;

the determination by any suitable means of the immunological complexespossibly formed, to detect the presence of a product corresponding tothe fusion gene resulting from the translocation t(12;22).

Kits for implementing these methods can advantageously be prepared.These kits, in the case of a method by hybridization, contain probesaccording to the invention as well as control specimens of DNA or RNA.In the case of a method by reverse transcription and PCR, they containthe appropriate oligonucleotides for implementing each of thesetechniques as well as control specimens of DNA or RNA; in the case of animmunological method, they contain the monoclonal antibodies as well ascontrol specimens of known reactivity.

The invention also relates to the use of DNA sequences of the invention,corresponding to a translocation t(12;22), to prepare anti-sensenucleotides, or analogs, having an anti-tumor activity, which byhybridization with all or part of the fusion gene inhibit itstranscription and thus prevent the production of mRNA and chimeraproteins and/or in another embodiment, they hybridize with thetranscription mRNA and thus inhibit the production of chimera proteins.

Consequently, the invention relates to the use of a nucleic acid or ananalog of this nucleic acid, capable of hybridizing with the nucleotidesequence of a hybrid DNA of the invention, corresponding to atranslocation t(12;22), for preparing a therapeutic agent that inhibitsthe expression of a fusion gene resulting from a chromosomaltranslocation t(21;22)(q134;q12), in tumor cells of patients sufferingfrom STMM.

Accordingly, the subject of the invention is also a therapeutic agentfor inhibiting the expression of a fusion gene resulting from achromosomal translocation t(12;22)(q13;q12), in tumor cells of patientssuffering from STMM, characterized in that it is essentially constitutedby a hybrid DNA of the invention, corresponding to a translocationt(12;22), or an analog of this DNA, capable of hybridizing with thenucleotide sequence of a fusion gene resulting from the chromosomaltranslocation t(12;22)(q13;q12).

In another embodiment, the invention relates to the use of a nucleicacid or an analog of this nucleic acid, capable of hybridizing with thenucleotide sequence of a hybrid RNA of the invention, for preparing atherapeutic agent that inhibits the translation of chimera proteinsresulting from a chromosomal translocation t(12;22)(q13;q12), in tumorcells of patients suffering from STMM.

Hence, the subject of the invention is also a therapeutic agent forinhibiting the translation of chimera proteins resulting from achromosomal translocation t(12;22)(q13;q12), in tumor cells of patientssuffering from STMM, characterized in that it is essentially constitutedby a hybrid RNA of the invention, corresponding to a translocationt(12;22), or an analog of this RNA, capable of hybridizing with thenucleotide sequence of the RNA originating in a fusion gene resultingfrom the chromosomal translocation t(12;22)(q13;q12).

The invention also relates to the nucleic acid corresponding to the Ewsgene and to the mRNA which originates from it and the cDNA that derivesfrom it, as well as the protein for which it codes. In effect, thepreparation of probes and primers constitute tools that enable thedetection of the normal Ews gene; this detection of the normal Ews genecan advantageously be combined with the methods of detecting the fusiongenes resulting from the various translocations in which this gene isinvolved, so as to serve as a positive control.

The DNAs of the invention may be introduced into expression vectorsderived from plasmids or virus, with the object in particular ofproducing the proteins corresponding to these DNA sequences so as toprepare specific antibodies of these proteins or to have pharmacologicalstudy models available.

Further characteristics of the invention will become apparent from theensuing description in conjunction with examples, it being understoodthat these examples do not in any way represent a limitation in thescope of the claims.

EXAMPLE 1 STUDY OF THE RECURRENT CHROMOSOMAL TRANSLOCATION t(11;22)ASSOCIATED WITH EWING'S SARCOMA

I--Cloning of Breakpoints

Using a panel of hybrid somatic cells, it has been demonstrated that thelocus identified with the VIIIF2 probe and that coding for the leukemiainhibition factor (LIF) are on each side and in proximity with thebreakpoint of chromosome 22 (O. Delattre, C. J. Azambuja, A. Aurias etal., Genomics 9, 721-727 (1991)). On the scale defined by Trask et al.(B. Trask, D. Pinkel, G. Van Den Engh, Genomics 5, 710-717 (1989)), thedistance between these two loci has been estimated by fluorescence insitu hybridization (FISH) at interphase nucleii between 1.5 and 2megabases.

Differential screening of a bank of specific cosmids of chromosome 22with various Alu-PCR products (D. L. Nelson, S. A. Ledbetter, L. Corbo,M. F. Victoria, R. Ramirez-Solis, T. D. Webster, D. H. Ledbetter, C. T.Casky, Proc. Natl. Acad. Sci. USA 86, 6686-6690 (1989)) generated fromfour hybrid cells has lead to the identification of three independentloci, which are telomeric at the breakpoint and in proximity with thelocus LIF (J. Zucman, O. Delattre, C. Desmaze, Genomics, in press). Twoof them are disposed in a large 450 kilobase contig constructedbeforehand by expansion of the LIF locus. By the bicolor FISH techniqueon interphase nucleii, the third locus, identified from the cosmidsstraddling Cos5 and Cos6 (J. Zucman, O. Delattre, C. Desmaze, Genomics,in press) has been located between the VIIIF2 and LIF loci. TheCos5/Cos6 locus has been extended progressively by recurrent isolationof straddling clones, using a bank of specific cosmids of chromosome 22.

In the course of this procedure, it has been demonstrated that twocosmids, named B6 and G9, straddle the breakpoint of chromosome 22 atthe derivative 11 of the translocation t(11;22), of A3EW2-3B, and Alu6,with two hybrid cells deriving from the ES and PN tumors, respectively,and containing derivative 11 (A. H. M. Geurts van Kessel, C. Turc-Carel,A. Klein et al., Mol. Cel. Biol. 5, 427-249 (1985); F. Zhang, O.Delattre, G. Rouleau et al., Genomics 6, 174-177 (1990)).

The FISH technique with metaphase chromosomes of a PN cell line employedwith one of the two cosmids confirms that the breakpoint has beencrossed, because a fluorescent signal is then observed at derivative 22of the translocation.

DNA fragments, in unique copies in the human genome, near the breakpointof chromosome 22 have been selected to analyze the DNA of a group of 20ES and PN tumors. In all cases, a breakpoint is observed in the sameregion of approximately 7 Kb, which has been called EWSR1, standing forEwing's sarcoma region 1.

A bank of cosmids made from the ICB104 cell line deriving from a PNtumor (F. Zhang, O. Delattre, G. Rouleau et al., Genomics 6, 174-177(1990)) has been screened with probes originating in EWSR1. Three groupsof straddling cosmids were isolated, one corresponding to the normalchromosome 22 and the other two corresponding to each of the twoderivatives of the translocation. Consequently fragments not originatingin chromosome 22 but originating in these cosmids were used to identifya clone, of which the FISH technique demonstrated that it derived fromregion q24 of chromosome 11.

Comparing the restriction maps for the intact and rearranged regions ofchromosomes 11 and 22 indicates, at the level of resolution allowed bythe study, that the translocation is simple and reciprocal.

The locus of chromosome 11 was extended over 100 Kb by recurrentisolation of straddling cosmids. Screening the same group of twentytumors with probes of this region has made it possible to identify 17breakpoints on chromosome 11. They are distributed without obviousclumping in a region of more than 40 Kb, called EWSR2. Interestingly,the two tumors have a variant translocation, and both of the two thathave a cytogenetically intact chromosome 11 are altered in region EWSR2;this is evidence of a submicroscopic rearrangement in these tumors.

On the molecular level, the positions of the breakpoints in the ES andPN tumors do not demonstrate obvious specificity, suggesting that theydo not allow differentiation of these two very close cancers.

II--Characterization of the Genes Involved in the Translocation

The region EWSR1 is flanked by three groups of sites for a rarerestriction site endonuclease. It appears in human cells that thesesites are at least partially nonmethylated, which suggests that theybelong to HTF islands (S. Lindsay, A. P. Bird, Nature 327, 336-338(1987)).

As has been demonstrated by cross hybridization with mouse DNA andhamster DNA, the region EWSR1 is included in a phylogeneticallypreserved region. Fragments were selected for Northern blot screening,prepared from RNA extracted from 7 PNET tumors exhibiting atranslocation t(11;22), three non-karyotyped ES tumors, and severalnormal tissues (lung, heart, liver, pancreas, placenta, kidney, skeletalmuscle) and four control or cell line tumors (neuroblastoma,pheochromocytoma, edinocarcinoma of the colon, HeLa).

A restriction fragment EcoRl of 3 Kb, named 22RR3, which is centromericin region EWSR1, in all specimens detects a 2.5 Kb transcript, and inthe 10 PNETs tested it detects one other specific transcript of variablesize.

The same 2.5 Kb transcript, but not the specific transcripts of the PNETtumors, is observed with a telomeric probe of region EWSR1 (probe22RR12). Conversely, a probe distal from the region EWSR2 of chromosome11 (prbbe 11RR1) gives the opposite result, by detecting the specifictranscript of PNET tumors and not the 2.5 Kb transcript. These resultsstrongly suggest the presence in the PNET tumors tested of a chimera RNAthat fuses together the sequences coded by chromosome 22 and chromosome11. Moreover, the probe 11RR1 demonstrates a transcript in the messengerRNAs of lung, heart, and liver tissues, which is not observed in theother normal tissues tested.

Probes named 22R3 and 22R12 were used to screen a bank of human cDNA,and the straddling clones that hybridize with the two probes have beencharacterized. The largest clone contains 1968 pd, whose open readingphase includes a first codon ATG which is present in the context of aKozak consensus sequence; it codes a protein of 656 amino acids, namedEWS. A search of the data bases (NBRF and Swissprot) has revealed thatthe sequence of 285 first amino acids has a homology with proteins suchas gluten, gliadin, chorionic protein S36, annexine VII, and theordeines B1 and C.

Nevertheless, the greatest homologies have been observed with theC-terminal domain of the large subunit of the II-eukaryotic polymeraseRNAs (CTD-polII).

The various molecules contain a domain that includes the repetition of apeptide of seven amino acid residues including tyrosine and majorproportions of proline and serine. This domain may take on a secondarystructure named in particular pro-β (N. Matsushima, C. E. Creutz, R. H.Kretsinger, Proteins 7, 125-155 (1990)).

The C-terminal portion of the protein EWS contains three regions (300 to340, 454-513, 559-640) rich in glycine (46%), arginine (19%) and proline(13%), which has homologies with proteins rich in glycines, such ascollagen, keratin and the proteins that link the single-strand nucleicacids.

A sequence of 85 amino acids disposed between the first and secondregions is homologous with a sequence encountered in several proteinsthat link RNA. This domain contains the RNP-1 and RNP-2 consensussequences (R. J. Bandziulis, M. S. Swanson, G. Dreyfuss, Genes Dev. 3,431-437 (1989)) and has been demonstrated as the RNA recognition patternfor the protein snRNP U1 at 70 Kd (C. C. Query, R. C. Bentley, J. D.Keene, Cell 57, 89-101 (1989)). In this region, other marks of similarhomologies have been found with several proteins that link RNA and havebeen functionally characterized. Nevertheless, the highest rate ofhomology has been obtained with the product of translation of the cDNAof the Drosophila clone pen p19, which does not have a known functionand contains elements of the PEN type that are repeated (S. R. Haynes,M. L. Rebbert, B. A. Mozer, F. Forquignon, I. B. Dawid, Proc. Natl.Acad. Sci. USA 84, 1819-1823 (1987)).

The probe called 11RR1 has been used to retrieve 11 straddling clonesthat originate in a human marrow cDNA bank. The longest clone, namedBM025, with a 2939 pb insert, contains a 1356 pb open reading phase.

Analyzing this sequence of deduced amino acids reveals a homology withmembers of the family of Ets genes. For the human gene Ets-1, thishomology is close to 33% in the transcription activation domain (A.Gutman, C. Wasylyk, Trends Genet. 7, 49-54 (1991)) and reaches 70% forthe DNA fixation domain (F. D. Karim, L. D. Urness, C. S. Thummel, etal, Genes Dev. 4, 1451-1453 (1990)).

The most striking homology has been demonstrated with the mouse geneFli-1, for which the amino acid identity reaches 97% (Y. Ben-David, E.B. Giddens, K. Letwin, A. Bernstein, Genes Dev. 5, 908-918 (1991)).

III--Characterization of Chimera mRNA Coding for the Hybrid Proteins

The hybridization of the 5' ends of the two cDNAs on the contigs ofchromosomes 11 and 22 has shown that the two genes are transcribed inthe direction from the centromere to the telomere.

The fusion transcript demonstrated by Northern blot is initiated atchromosome 22 and terminated at chromosome 11. In order to study thejunction of the two genes, the inventors have looked for the exons thatcentromerically flank the region EWSR1 and telomerically flank theregion EWRS2. Two genome fragments named 22HP.5 and 11RR1, whichhybridize respectively with the cDNAs coding for the EWS and HUM-FLI-1proteins have been sequenced and have revealed the presence of an exonin each case.

Oligonucleotides homologous with these exons have been used to performreverse transcription amplified by PCR of the RNAs originating in thevarious sources.

Except for the RNAs originating in tissues from the breast,neuroblastoma (IMR32), pheochromocytoma, lymphoma, and ovariancarcinoma, all the RNAs originating from the seven PNET tumors with atranslocation t(11;22) and those originating in the three non-karyotypedES tumors enable the amplification of a specific product. Depending onthe tumor, three different sizes of amplification products have beenobserved within a first period of time. Their sequences reveal threetypes of fusion transcripts.

The first type contains exon sequences present in the fragments 22HP.5and 11RR1 and a 174 bp sequence originating in the adjacent exon that ismost centromeric in the Hum-Fli-1 gene.

The second and third types differ from the first in the presence at thelevel of the junction site of additional sequences originatingrespectively in the region that codes the Ews and Hum-Fli-1 genes. Inall cases, the fusion is in phase, and the resultant chimera proteinsdiffer from the protein EWS by the substitution, for the RNA fixationdomain, of the DNA fixation domain of the HUM-FLI-1 protein homologousto the domain of the protein ETS. A similar study made on approximately40 ES and PN tumors have made it possible to demonstrate other types offusion genes resulting from the chromosomal translocation t(11;22).

IV--Discussion

The protein EWS, through its total sequence, shares homologies withknown proteins to interact with single-strand nucleic acids, and moreparticularly with RNA.

First, the C-terminal region contains an RNA recognition pattern whichis also encountered in a group of proteins that participates in thepost-transcriptional process of RNA (A. D. Frankel, I. W. Mattaj, D. C.Rio, Cell 67, 1041-1046 (1991)). Furthermore, in the protein EWS thispattern is flanked by sequences of amino acids rich in glycene. Suchsequences, obtained in various RNA fixation proteins, also interact withRNA (A. Kumar, J. R. Casas-Finet, C. J. Luneau, et al, J. Biol. Chem.265, 17094-17100 (1990); S. H. Munroe, X. Dong, Proc. Natl. Acad. Sci.USA 89, 895-899 (1992)).

Second, the N-terminal region of the protein EWS has homology with theregion CTD-pol II. It has been suggested that this domain interacts withtranscription factors at the level of the initiation complex (J. L.Corden, TIBS 15, 383-387 (1990)).

These homologies of the EWS protein suggest that it has two differentfunctional domains that together participate in the mechanism of geneexpression.

The family of genes Ets is involved through various mechanisms in theerythroleukemias induced in the mouse and in chickens by retroviruses.The first member of this family has been discovered to be aco-transduced element that gives rise to hybrid proteins containing MYBand ETS-1 (D. K. Watson, R. Ascione, T. S. Papas, Crit. Rev. Oncogenesis1, 409-436 (1990)).

Independently, two other members of this family, Spi-1 (PUI) and Fli-1,are activated by the retroviral integration of the various strains ofthe Friend leukemia virus (Y. Ben-David, A. Bernstein, Cell 66, 831-834(1991)). All the members of this family of proteins have a highlypreserved region, called the ETS domain (F. D. Karim, L. D. Urness, C.S. Thummel, et al. Genes Dev. 4, 1451-1453 (1990)), of which it has beendemonstrated in the majority of cases that it is fixed specifically tothe elements of the promoter region that are rich in purine, andpromotes the transcriptional activation of various viral or eukaryoticcell genes (A. Gutman, C. Wasylyk, Trends Genet.7, 49-54 (1991); F. Lim,N. Kraut, J. Frampton, T. Graf, EMBO J. 11, 643-652 (1992); R. A.Hipskind, V. N. Rao, C. G. F. Mueller, et al, Nature 354, 531-534,(1991)).

Furthermore, toward the N-terminal, the proteins ETS-1 and ETS-2 containa region that promotes transcription when it is linked with the DNAfixation domain of LexA (B. Wasylyk, A.Gutman, P. Flores, A. Begue, D.LePrince, D. Stehelin, Nature 346, 191-193 (1990)), or Gal4 (S. Seneca,B. Punyammalee, N. Bailly, et al., Oncogene 6, 357-360 (1991)).

The strongest homology relating to the product of translation ofHum-Fli-1 appears with the murine protein FLI-1 (Y. Ben-David, E. B.Giddens, K. Letwin, A. Bernstein, Genes Dev. 5, 908-918 (1991)), aprotein for which the homologies with the DNA fixation and ETS-1transcription activation domains can be clearly identified.

It has been demonstrated that the retroviral insertion site thatactivates the murine gene Fli-1 is located near the gene Ets-1 onchromosome 9 in the mouse. The insertion site is phylogeneticallypreserved and is homologous with a region of human chromosome 11 nearthe gene Ets-1 (V. Baud, M. Lipinski, E. Rassart, L. Poliquin, D.Bergeron, Genomics 11, 223-224 (1991)). On the basis of both thishomology and this syntenic preservation, it is proposed that Hum-Fli-1represents the cDNA of the human gene homologue to the murine geneFli-1.

Cloning the chromatically acquired recurrent chromosomal translocationbreakpoints has illuminated two major mechanisms of cancerization:deregulation of the expression of a gene, and generation of fusionproteins (E. Solomon, J. Borrow, A. D. Goddard, Science 254, 1153-1160(1991)).

The translocation described in the discussion of the invention isclearly geared to the fusion of two genes which belong to families untilnow not implicated in human carcinogenesis: the family ETS of proteinsthat fixed to DNA, and the family of proteins affixed to RNA. Becausethe translocation is recriprocal, without apparent loss of geneticmaterial, at each of the two derivative chromosomes, the end 5' of thegene is juxtaposed with the 3' end of the other gene.

The chimera gene generated on the derivative (11) is not expressed at asufficient level to be measured by northern blot and does not seem to beinvolved in the tumoral phenotype, since the derivative (11) canoccasionally be lost in ES tumors (C. Turc-Carel, I. Philip, M. P.Berger, T. Philip, G. M. Lenoir, Cancer Genet. Cytogenet. 12, 1-19(1984); E. C. Douglass, M. Valentine, A. A. Green, F. A. Hayes, E. I.Thompson, J. N. C. I. 77, 1211-1213 (1986)). Conversely, the transcripthybrid generated by derivative (22) is visible by northern blot eventhough its intracellular level appears to decrease in comparison withthe normal transcript of the gene Ews coated by chromosome 22. Thisdifference could be due to an instability by a long non-coating AT-rich3' sequence (G. Brawerman, Cell 57, 9-10 (1989)), like that encounteredin the Hum-Fli-1 transcript.

The translocation t(11-22) has two direct consequences:

First, it puts the expression of the DNA fixation domain, that is,Hum-Fli-1, under the control of an ectopic promoter, which is the Ewspromoter, of which northern blot hybridization has demonstrated that itdoes not share the tissue specificity of the promoter Hum-Fli-1.

Second, the translocation substitutes the DNA succession domain,Hum-Fli-1, for an RNA succession domain and connects it by the samepolypeptide chain to a domain that has homology with CTD-polII.

The constant involvement of the 22q12 band and more precisely the regionEWSR1 in ES and PN tumors indicates that this latter domain plays anessential role in the process of cancerization. The structure of thechimera protein indicates that this role is probably played out by wayof alteration of the regulation of transcription of target genes,Hum-Fli-1.

By analogy with the mode of action of CTD-polII (C. L. Peterson, W.Krugen, I. Herskowitz, Cell 64,1135-1143 (1991)), the chimera proteincan functionally impede the negative regulation elements that controltranscription. One possible example of such alteration could be theantigen MIC2, which is specifically over-expressed in PNET tumors thathave a translocation t(11;22) (I. M. Ambros, P. F. Ambros, S. Strehl, etal., Cancer 67, 1886-1893 (1991); P. Garin-Chesa, E. J. Fellinger, A. G.Huvos, et al., Am. J. Pathol.139, 275-286 (1991)).

The implication in a solid tumor of a gene of the Ets type indicatesthat the role of the family of proteins ETS in the cancerization processis not limited to hematologic cancers. In the case of erythroleukemias,in which this family is implicated, a highly transformed phenotype isassociated with other alterations, intervening either by co-transductionof Myb sequences or by an alteration of the gene TP53 (Y. Ben-David,a.Bernstein, Cell 66, 831-834 (1991)).

In the case of Ewing's Sarcoma, besides the translocation t(11;22),other recurrent chromosomal aberrations including a non-compensatedtranslocation t(1q16p) have been described (E. C. Douglass, M.Valentine, A. A. Green, F. A. Hayes, E. I. Thompson, J. N. C. I.77,1211-1213 (1986); F. Mugneret, S. Lizard, A. Aurias, C. Turc-Carel,Cancer Genet. Cytogenet.30, 239-245 (1988)). Their contribution to thetumoral phenotype has yet to be evaluated.

Analyses of karyotypes demonstrating the presence of a translocationt(11;22) in small cell tumors have been used for several years as adiagnostic criterion of ES and PN tumors. Two new approaches fordiagnosis are now available:

The first is based on the fact that the small size of the region EWSR1allows simple detection of genomic rearrangements by the southern blottechnique;

The second is based on reverse transcription and gene amplification byPCR and furnishes a sensitive method for demonstrating the presence offusion transcripts in tumors and in their potential sites of metastasis.

It is currently possible to obtain indications as to the frequency andspecificity of the Ews and Hum-Fli-1 alterations in human tumors, andmore particularly in those exhibiting cytogenetic aberrations 22q12and/or 11q24 (J. Whang-Peng, C. E. Freter, T. Knutsen, J. J. Nanfro, A.Gazdar, Cancer Genet. Cytogenet. 29, 155-157 (1987); J. D. Chadarevian,M. Vekemans, T. A. Seemayer, N. Eng. J. Med. 311, 1702-1703 (1984); A.O. Cavazzana, S. Navarro, N. Noguera, et al., Adv. Neuroblastoma Res. 2,463-473 (1988); N. V. Vigfusson, L. J. Allen, J. H. Philip, T.Alschibaja, W. G. Riches, Cancer Genet. Cytigenet. 22, 211-218 (1986);A. Aurias, C. Rimbaut, C. Buffe, J. Dubousset, A, Mazabraud, N. Eng. J.Med. 309, 496-497 (1983); C. Turc-Carel, I. Philip, M. P. Berger, T.Philip, G. M. Lenoir, N. Eng. J. Med. 309, 497-498 (1983); J. WhangPeng, T. J. Triche, T. Knutsen, J. Miser, E. C. Douglass, M. A. Israel,N. Eng. J. Med. 311, 584-585 (1984); C. Turc-Carel, A. Aurias, F.Mugneret, et al., Cancer Genet. Cytogenet.32, 229-230 (1988); C.Turc-Carel, P. Dal Cin, U. Rao, C. Karakousis, A. Sandberg, CancerGenet. Cytogenet. 30, 145-150 (1988); W. P. V. Shen, R. F. Young, B. N.Walter, B. H. Choi, M. J. Smith, J. Katz, Cancer Genet. Cytogenet. 45,207-217 (1990); J. M. Trent, Y. Kaneko, F. Mitelman, Cytogenet. CellGenet. 51, 533-562 (1989)).

The hypothesis of a relationship between Ews and a hereditary geneticdefect responsible for type II neurofibromatosis, which has beenlocalized in the same region 22q12 (G. A. Rouleau, B. R. Seizinger, W.Wertelecki, et al., Am. J. Hum. Genet. 46, 323-328 (1990)) can also beexamined.

V. Description of These Figures

FIG. 1

1) Results

FIG. 1 shows the map of the region EWSR1 on chromosome 22.

at A, a part of the contig resulting from the expansion of the Cos5/Cos6locus identified by differential Alu-PCR hybridization is schematicallyshown (J. Zucman, O. Delattre, C. Desmaze, C. Azambuja, G. Rouleau, P.DeJong, A. Aurias, G. Thomas, Genomics, in press).

The position of the contig is indicated with respect to other differentnearby loci. The cosmids B6 and G9 cover the breakpoints of A3EW2-3B andALU 6, two hybrid somatic cells that contain the derivative (11) of ESand PN tumors, respectively (A. H. M Geurts van Kessel, C. Turc-Carel,A. Klein et al., Mol. Cel. Biol. 5, 427-429 (1984); F. Zhang, O.Delattre, G. Rouleau et al., Genomics 6, 174-177 (1990)).

Shown at B is the restriction map EcoR1 of a 100 Kb fragment centeredaround the region EWSR1. The position of three dinucleotide regions CpGis shown: CpG1 contains three sites SacII, three sites BssHII, and onesite MluI; CpG2A contains three sites SacII, three sites BssHII; CpG2Bcontains three sites BssH2, three sites SacII, and one site NotIU.

Shown at C is a detailed restriction map of the region EWSR1 on whichthe sites PstI are marked (P), the sites BamHI are marked (B), the sitesXbaI are marked (X), and the sites EcoRI are marked (R). The verticalarrows indicate the rearranged restriction fragments in each of the 20tumors tested.

2) Method

a) A specific bank of chromosome 22, LL22NC01 constructed from theLawrist 5 cosmid vector was used to illustrate what was learned aboutchromosome 22. The terminal fragments where the entire insert have beenmarked by random priming, using a dCTP marked at a by ³² P. Therepetitive human sequences and the residual contaminant vectors havebeen inhibited with a large excess of total human DNA and DNA vectors.

The preincubated probe was then used to screen the bank, by standardprocedures, and the straddling tosmids were identified.

b) The tumor specimens were recovered immediately after surgery.

A fragment was used for karyotype analysis:

tumors T2 to T11 have a typical translocation t(11;22)(q24;q12);

tumor T12 has a complex translocation t(10;11;22;12)(q22;q24;q12;q24);

tumor T13 has a translocation variant t(14;22)(q32;q12);

tumor T14 has a translocation variant t(7;22)(q35;q12);

tumor T15 has a complex translocation t(11;11;22)(q13;q24;q12).

Karyotype analysis of tumors 16 through 19 was not done, and tumor T20has only normal metaphases. Bicolor FISH analysis on interphase nucleiiof tumors T17 and T19 shows the appearance of a breakpoint in band 22q12(C. Desmaze, J. Zucman, O. Delattre, G. Thomas, A. Aurias, Genes Chr.Cancer, in press).

Tumor T1 corresponds to the hybrid cell line A3EW2. The hybrid cell lineAlu6 originates from tumor T8. Tumors T7-T13 are PN tumors; all theother tumors were diagnosed as Ewing's sarcoma. The DNA of bloodspecimens (N) and tumor specimens (T) was digested with the enzyme PstIand analyzed by the Southern blot method with the 5.5-sac probe.

FIG. 2

This Figure shows the detection of rearranged bands in the tumors; moreparticularly, the detection of abnormal genome fragments in six tumors.The number of the tumor is shown at the top of the bands. The DNAspecimens taken from the blood of patients carrying tumors T16 and T18are indicated by (N). In each case, the fragment corresponding to therearranged junction is indicated by a horizontal arrow.

FIG. 3

1) Results

FIG. 3 shows the restriction map of the regions involved in thechromosomal translocation t(11;22)(q23;q12); the normal regions (22n)and (11n) and the two derivatives 11 and 22 of the translocationt(1;22), der(11) and der(22), are shown for the tumor T11.

The double line represents chromosome 11; the heavy line representschromosome 22.

In the derivatives der(11) and der(22), the single line between theparts-of chromosomes 11 and 22 represents the fusion fragment EcoR1.EWSR1 and EWSR2 represent the smallest region that contains all thebreak points identified in chromosome 22 and chromosome 11,respectively. The vertical arrows indicate the position of therearranged fragments EcoR1 identified in 17 tumors.

The abbreviations designating the tumors are identical to those used forFIG. 1.

Finally, the positions of the probes 22RR3, 22RR13, 22HP.5 and 11RR1 areindicated.

2) Methods

A linker XhoI was inserted at the level of the site BamHI of a SuperCosvector sold by Stratagene. The 3' end of the DNA, partially digested byMboI of a cell line ICB 104 derived from tumor T11 (F. Zhang, O.Delattre, G. Rouleau et al., Genomics 6, 174-177 (1990)), was partiallyfilled with dGTP and dATP. The high molecular weight fragments werepurified on gel and linked to the site XhoI of the modified SuperCosvector and partially with dTTP and dCTP. After conditioning with theGigapack Gold kit sold by Stratagene, the virions were used to infectthe E. coli strain DH5 alpha MCR.

The bank obtained of 4×10⁵ independent cosmids was screened with theprobes 22RR3 and 22RR12. The groups of straddling clones containingeither a nonaltered region EWSR1 (22N) or the junction fragment of thederivative 22 of the translocation, that is der(22), or derivative 11 ofthe translocation, der(11), were identified. The probe 11RR1 made itpossible to retrieve a cosmid, which by fluorescent in situhybridization on chromosome is demonstrated to originate in band 11q24.This cosmid was used to extend the locus to the normal chromosome 11.

FIG. 4

This Figure shows the northern blot detection of abnormal transcripts inEwing's sarcoma or cell liens.

The same northern blot containing 1 μg of polyadenylated mRNAoriginating in tumors T19, T11, T8 and T18, or the neuroblastoma lineIMR32, were successively hybridized with the probes 22RR3, 22RR12 and11RR1. N represents the normal transcript of the gene Ews; R representsthe fusion transcripts.

Depending on the tumor, the sizes of the abnormal transcripts differsubstantially. The same abnormal transcripts are detected both with theprobe 22RR3 and with the probe 11RR1, which indicates that thetranslocation gives rise to the synthesis of a chimera transcript.

FIG. 5

1) Results

This Figure represents the reverse transcription and PCR detection ofthree types of chimera transcripts.

M is a size marker. BREAST corresponds to a breast tissue; OVARYcorresponds to an ovarian carcinoma; IMR32 corresponds to aneuroblastoma cell line. The abbreviations designating the patients areidentical to those of FIG. 1.

The three types of transcripts described in FIG. 9 hereinafter areindicated 1, 2, 3 by an arrow.

2) Method

The oligonucleotide 11A of the following formula:

    5' AGAAGGGTACTTGTACATGG 3'

was used as a primer for the reverse transcription of 1 μg of total RNA,using a PCR kit Gen Amp RNA made by Cetus. The resultant cDNA wassubject to 30 cycles of PCR amplification with the primers 11.3 and22.3, of the following respective formulas:

    5' ACTCCCCGTTGGTCCCCTCC 3'

    5' TCCTACAGCCAAGCTCCAAGTC 3'.

Each cycle included a step of denaturation at 90° C. for 30 seconds,then at 65° C. for one minute, and an extension step at 72° for twominutes. The amplified fragment was identified by gel electrophoresisand revealed by ethidium bromide.

FIG. 6

1) Results

This Figure represents the nucleotide sequence of the cDNA containingthe entire coding region and the nontranscripted 3' end of the gene Ews,as well as the sequence of amino acids deduced from this cDNA, codon bycodon.

These sequences are numbered on the left. Two polyadenylation signals,one beginning at nucleotide 2143 and the other at nucleotide 2332, areunderlined.

The first codon of methionine is localized with a purine (A) at position-3 and a guanosine (G) at position +4, which matches the Kozak consensussequence.

2) Method

The probes 22RR3 and 22RR12 were used to screen a bank of human fetalbrain cDNA (Stratagene catalog number 936206) and made it possible toidentify a clone which has been named BF1AC5. The probe 11RR1 was usedto screen a bank of human marrow cDNA (Clontech catalog number HL1058)and made it possible to identify a clone which has been named BM025. Onemillion clones were placed on slides and screens in each bank. Thesubcloned DNA fragments in the phages M13mp18 or M13mp19 were used asmatrices to determine the nucleotide sequences, using the method ofdideoxy chain termination and either a modified polymerase T7 or thepolymerase taq sold by Amersham. The 2372 bp and 2939 bp cDNA sequencesof the clones BF1AC5 and BM025 were determined on the strands ofstraddling subclones, by using either the primer M13 or commercialprimers. It has been demonstrated that they contain the entire codingsequence of the genes Ews and Hum-Fli-1.

Direct sequencing of the products of PCR amplification was done with aSequenase sold by USB, after 30 cycles of asymmetrical amplificationwith either primer 11.3 or primer 22.3, and then purification through acentricon 100 membrane sold by Amicon.

FIG. 7

This Figure shows the cDNA of gene Hum-Fli-1.

FIG. 8

FIG. 8 shows the nucleotide sequence of the fusion cDNA obtained byreverse transcription and PCR amplification of the type 1 fusiontranscript.

The homologue sequence (on the end 5') or complementary sequence (on the3' end) to the primers used for the PCR amplification are underlined.The vertical line indicates the junction between the two genes that arepresent between the first and second positions of codon 265 of the geneEws, and between the same positions of codon 219 of the gene Hum-Fli-1.

FIG. 9

This Figure shows the various domains of the protein EWS coded by thegene Ews.

Shown at A are the various peptide domains. The shaded region representsthe first 270 amino acids containing a high proportion of tyrosine,glutamine, serine, threonine, glucione, alanine and proline, whichtogether represent approximately 90% of all the residues. In thisregion, the majority of the tyrosines are present every 5 to 9 residues,and they define a degenerate pattern repeated 31 times. After thetyrosine, the most recurrent residues in the repetition are a serine atposition -1 (50%), a glycine at position +1 (50%), and two glutamines inpositions +2 and +3 (70% and 40%, respectively). This part of themolecule has a homology with CTD-polII.

The three shaded zones correspond to regions rich in glycine, arginineand proline.

The zone marked "RNA BD" represents an assumed RNA fixation domain,which is explained at B.

The arrows indicate the position of the junction point Hum-Fli-1 for thethree different types of chimera proteins, as deduced by reversetranscription and PCR amplification as explained above.

Shown at B are the most significant alignments of the protein EWS in theassumed RNA fixation region. Dpen p19 for the Drosophila clone pen p19(S. R. Haynes, M. L. Rebbert, B. A. Mozert, F. Forquignon, I. B. Dawid,Proc. Natl. Acad. Sci. USA 84, 1819-1823 (1987)); H nucl for humannucleotine (M. Srivastava, O. W. McBride, P. J. Flemming, H. B. Pollard,A. L. Burns, J. Biol. Chem. 265, 14922-14931 (1990)); HsnRNP U1 for 70Kd human snRNP U1 (R. A. Spritz, K. Strunk, C. S. Surowy, S. O. Hoch, D.E. Barton, U. Francke, Nuclei. Acid. Res. 15, 10173-10393 (1987));HhnRNP A1 and B1 for human hnRNP A1 (G. Biiamonti, M. Buvoli, M. T.Bassi, C. Morandi, F. Cobianchi, S. Riva, J. Mol. Biol. 207, 491-503(1988)) and B1 (C. G. Burd, M. S. Swanson, M. Goerlach, G. Dreyfuss,Proc. Natl. Acad. Sci. 86, 9788-9792 (1989)); HPABP for human poly(A)fixation protein (T. Grange, C. Martin de sa, J. Oddos, R. Pictet, Nuc.Acids Res. 15, 4771-4787 (1987)). #2, #3 and #4 relate to various RNAfixation domains within the same protein. The invariable positions amongthese proteins are shaded, and the minimum substitutions between EWS andat least four of the six proteins are indicated in heavy characters.

Domains I through IV have been described by C. C. Query, R. C. Bentley,J. D. Keene, Cell 57, 89-101 (1989). RNP-1 and RNP-2 relate to theconsensus pattern commented on by R. J. Branzilius, M. S. Swanson, G.Dreyfuss, Genes Dev. 3, 431-437 (1989). To the right in FIG. 8B, underthe caption "identities", the percentage of identical residues betweenEWS and each of the RNA fixation domains in question is indicated. Alsoindicated on the C-terminal side is the presence of a region rich inglycine residue (GR) and a basic domain (Basic D).

FIG. 10

This Figure schematically shows the proteins coded by the normal geneEws (EWS), the normal gene Hum-FlI-1 (HUM-FLI-1), and their fusion genes(EWS/Hum-Fli-1 type 1, 2 and 3).

In the three types of chimera proteins, the C terminal portion of EWScontaining the assumed RNA fixation domain (designated as RNA BD) isreplaced with the C-terminal portion of the gene Hum-Fli-1 containingthe domain ETS (designated ETS D). The type 1 chimera protein isentirely represented in the figure; the type 2 or 3 proteins may bereduced from that of type 1 by insertion of 84 or 22 additional aminoacids originating in EWS or Hum-Fli-1, respectively. The positions ofthe interrupted codons are indicated in all cases.

FIGS. 11, 12, 13, 14, 15 and 16

FIG. 11 shows the restriction map of the genes Ews and Hum-Fli-1 at thelevel of their break region EWSR1 and EWSR2. The position of the exonsis indicated with respect to the restriction sites. The open readingframes are divided by the various exons; each intron between two codingexons interrupts the open reading frame in one of the three phases, A, Bor C.

FIG. 12 represents the exon structure of the gene Ews.

It has also been possible to sequence all the intron-exon junctions forthis gene, as shown in FIG. 12.

FIG. 13 shows the exon structure of the gene Hum-Fli-1.

It has also been possible to sequence the majority of the intron-exonjunctions for this gene, as shown in FIG. 13.

It is accordingly possible, from this information, to contemplate thegreat number of possible fusion products of the two genes. A largenumber of fusion products that juxtapose these exons have been observedby reverse transcription and then gene amplification by PCR.

FIG. 14 shows the exon juxtapositions of the genes Ews and Hum-Fli-1 andthe junction sequences of the resultant fusion transcripts. On the leftin FIG. 14, the exons of Ews (shaded boxes) and the exons of Hum-Fli-1(solid box) involved in the juxtaposition are schematically shown; shownon the right in FIG. 14 are the junction sequences of the fusiontranscripts corresponding to the juxtapositions symbolized on the leftin this figure, with the number of cases observed.

FIG. 15, using a nomenclature identical to that of FIG. 14, shows a caseof juxtaposition observed concerning exon 8 of Ews and exon 7 ofHum-Fli-1, between which an original unknown (alien) sequence isinterposed.

FIG. 16, using a nomenclature identical to that of FIGS. 14 and 15,shows four cases in which two different fusion transcripts have beenobserved, juxtaposing a series of exons of Ews and a series of exons ofHum-Fli-1, this being genuinely the result of alternating splices. Inthe fusion sequences on the right of the figure, the asterisks indicatethe sequence of the product of translating a stop codon.

FIG. 17

This Figure represents the sequence of the promoter region of the geneEws.

EXAMPLE 2 STUDY OF THE RECURRENT CHROMOSOMAL TRANSLOCATION t(21;22)ASSOCIATED WITH EWING'S SARCOMA

FIG. 18, in a nomenclature identical to that of FIGS. 14, 15 and 16,shows the junction sequences of four fusion transcripts corresponding tofive cases observed, which juxtapose exon 7 of the gene Ews with theassumed exons 6, 8 and 9 of the gene Erg and the exon 10 of Ews with theassumed exon 6 of Erg.

EXAMPLE 3 STUDY OF THE RECURRENT CHROMOSOMAL TRANSLOCATION t(21;22)ASSOCIATED WITH SOFT TISSUE MALIGNANT MELANOMA (STMM)

I--Description of the Figures

FIGS. 19 and 20 show the rearrangement of the gene Ews in the cell lineSU-CCS-1 of STMM.

FIG. 19 shows the restriction map of the region EWSR1 and indicates theposition of the probes used and the deduced positions of the breakpointof chromosome 22 in the tumor Sten-1 and in the cell line SU-CCS-1.

The portion (a) of FIG. 20 represents the analysis, by the Southern blottechnique, of the control DNA and that of the SU-CCS-1 line doublydigested by EcoR1 and Pst1 and hybridized with the probe RR2 originatingin EWSR1; in this figure, N indicates the normal band, and R indicatesthe band corresponding to the rearrangement.

Part (b) of FIG. 20 represents the northern blot detection of anabnormal transcript with the probe EWS-5' EB defined by the fragment 5'EcoR1/BamH1 of the cDNA of the gene Ews; the RNAs extracted from a HeLacell line and a glioma cell line were used as controls; N indicates thenormal transcript of the gene EWS, and R indicates an additionaltranscript.

Part (c) of FIG. 20 represents the analysis of the amplified productsobtained in the last step of the RACE procedure; compared with thecontrol RNA, which was used to promote the amplification from the normalEWS transcript (EWS), the RNA extracted from SU-CCS-1 has one additionalamplified fragment, which has been shown to derive from the fusiontranscript (Fusion).

FIGS. 21 and 22 relate to the identification of the fusion transcriptEWS/ATF-1 in the STMM tumors and cell lines.

FIG. 21 represents the PCR-reverse transcriptase detection of thechimera transcript. After reverse transcription of the total RNA of theSU-CCS-1 line, of the STMM tumors Sten-1 and 5852/88 and a HeLa cellline, PCR was done with the primers 22.1 (SEQ ID NO. 119) and ATF-1.1(SEQ ID NO 125), corresponding respectively to the exon 7 of the geneEws and to the 3' region translated at ATF-1. Analysis of the amplifiedproducts was done on a 1% agarose gel (control: no RNA).

FIG. 22 shows the sequence of the 954 bp cDNA fragment obtained byPCR-reverse transcriptase of the transcript resulting from the fusion ofthe genes Ews and Atf-1. The identical sequences (at the 5' region) orcomplementary (at the 3' region) to the primers used for theamplification are underlined in the figure. The vertical line indicatesthe junction between the two genes; it comes between the second andthird positions of codon 325 and between the same positions of codon 65of the gene Atf-1. The contribution of the sequence of the gene Ews isindicated in heavy characters. The position of the specificoligonucleotide of the gene Ews used in the RACE procedure is indicatedand underlined.

FIGS. 23 and 24 represent the junction of the chimera transcripts andprovides a schematic illustration of the proteins deduced from them.

FIG. 23 shows the partial sequence of the cDNA of the genes Ews andAtf-1 and of their hybrid transcripts in the junction regions, showingthe open and closed junction boxes, respectively, for the hybridtranscript Ews/Atf-1 from the der(22), and for the transcript Atf-1/Ewsfrom the der(12).

FIG. 24 shows the products of translation corresponding to the 4 cDNAsindicated in FIG. 23. In the chimera protein EWS/ATF-1, the C-terminalportion of EWS containing two of the three regions rich in glycine(shaded regions) and the homologue RNA fixation domain (RNA-BD) arereplaced by the C-terminal part of ATF-1 containing the basic fixationdomain of DNA and the four leukin heptameres that together define thedomain bZIP.

The reciprocal product is indicated by ATF-1/EWS and is coded by thechromosome der(12). The consensus recognition site for phosphorylationby the protein kinase A (amino acids 60 through 63) is indicated by PKA;NTD-EWS indicates the N-terminal domain of EWS.

II--Method

1) Tumors and Cell Lines

The cell line SU-CCS was prepared from a pleural effusion of STMM (L.Epstein, A. O. Martin, R. Kempson, Cancer Res. 44, 1265-1274 (1984)).Cytogenetic analysis revealed a complex karyotype having a single normalcopy of chromosomes 12 and 22. The cell line was cultivated as describedinitially. The frozen fragments of primary STMM tumors, Sten-1 (G.Stenman, L-G Kindblom, L. Angervall, Genes Chrom. Cancer 4, 122-127,1992), W9150 (F. Speleman, C. Colpaert, G. Goovaerts, J. G. Leroy, E.Van Marck, Cancer Genet. Cytogenet. 48, 176-179, 1992) and 5852/88 (A.A. Fletcher, Genes Chrom. Cancer 5, 184, 1992), were collected and keptfrozen at -80° Celsius until used. All the primitive tumors werecytogenetically characterized beforehand, and they all have atranslocation t(12;22)(q13;q12). Extraction of the DNA and RNA and theSouthern and northern blot tests were done in accordance with standardprocedures (T. Maniatis, E. F. Fritsh, J. Sambrook, J. MolecularCloning. A Laboratory Manual, Cold Spring Harbor Laboratory Press,1989).

2) Probes

The genome probes are shown in FIG. 19. The cDNA probes are as follows:

EWS 5' EB is the 0.8 Kb coding region of the cDNA of EWS that ends at asite BamH1;

EWS 3' XE is the last 0.85 Kb coding region beginning at site XbaI;

the probe ATF-1 3' was prepared by PCR and extends from nucleotide 754to nucleotide 955 in FIG. 17.

3) Gene Amplification Procedure by PCR

All the PCR reactions were performed in 30 μl with the Amp TAQ kit soldby Cetus, and by using a cycler sold by Perkin Elmer. Unless otherwiseindicated, 30 cycles were performed with the following parameters:

denaturation step: 940 Celsius for 30 seconds;

heating temperature: as indicated specifically for each case, for 60seconds;

elongation: at 72° Celsius for 120 seconds.

The amplified products were analyzed on 1% agarose TBE gels.

4) Race Procedure

A polyA and polyA RNA μg of the cell line SU-CCS-1 was denatured for 10minutes at 80° Celsius and transcripted using the PCR Gen Amp RNA kitsold by Cetus. The initial reverse transcription was performed in 2a μlusing the A3'NV primer. Incubation at 42° Celsius for 45 minutes wasfollowed by 5 minutes at 94° Celsius. An aliquote of two microliters ofthe resultant cDNA was amplified by PCR, using a three-step procedureinvolving straddling oligonucleotides. The first step uses the primers22.1 and A3'-4 in 20 cycles (temperature of association 64° C.). 20 μlwere analyzed on a 1% agarose gel with a low melting point. The portionof the gel containing the amplified products, of approximately 1.5 to2.5 Kb, was collected, melted at 68° Celsius, and diluted in an equalvolume of TE buffer. One microliter was subjected to PCR amplification(temperature of association 66° C.), using primers 22.3 and A3'-5.Analysis of 20 μl on a 1% agarose gel with a low melting point revealedtwo bands. For the sake of later characterization, each band was cutfrom the gel, diluted to 1/1000 in a TE buffer, and 1 μl was amplifiedby PCR (temperature of association 67° C.), using primers 22.7 andA3'-6.

5) RT-PCR Analysis of the Transcripts of the STMM

A microgram of the total RNA was reverse transcripted using as theprimer an oligo-dT and the PCR Gen Amp RNA kit made by Cetus, under theconditions described by the manufacturer. The cDNA obtained wassubjected to three different PCR amplifications:

the amplified products corresponding to the transcripts Ews/Atf-1 wereobtained with the primers 22.1 and ATF-1.1 (temperature of association60° C.);

the amplified products corresponding to the normal transcript Atf-1 wereobtained with the primers ATF-1.3 and ATF-1.1 (temperature ofassociation 60° C.);

the amplified products corresponding to the transcripts Atf-1/Ews wereobtained with the primers ATF-1.3 and 22.4 (temperature of association65° C.).

6) Sequencing

The products of the PCR were subcloned in the phages M13mp18 andM13mp19. In each case, three independent clones were entirely sequencedwith the Taq polymerase kit made by Applied Biosystems, usingdideoxynucleotides and fluorescent primers. The reaction sequences wereanalyzed using an automatic sequencer made by applied biosystems.

7) Fluorescent In Situ Hybridization (FISH) Study

The library of genome cosmids constructed from the ICB 104 cell line anddescribed by Zucman et al. (J. Zucman, O. Delattre, C. Desmaze, B.Plougastel, I. Joubert, T. Melot, M. Peter, P. De Jong, G. Rouleau, A.Aurias, G. Thomas, Genes Chrom. Cancer 5, 271-277, 1992) was screenedwith the part Atf-1 of the fusion transcript Ews/Atf-1, and the cosmidsCCS2.2, F7 and G9 corresponding to the 3' and 5' region, respectively,of Ews. Monocolor and bicolor FISH analyses were done as described byDesmaze et al. (C. Desmaze, J. Zucman, O. Delattre, G. Thomas, A.Aurias, Genes Chrom. Cancer 5, 30-34, 1992).

III--Results

1) Alteration of the EWS Protein in STMM

The DNA was extracted from a primary STMM tumor, called Sten-1, having acharacteristic translocation t(11;22) (G. Stenman, L-G Kindblom, L.Angervall, Genes Chrom. Cancer 4, 122-127, 1992), and a cell calledSU-CCS-1 having a complex karyotype containing an abnormal chromosome 12(L. Epstein, A. O. Martin, R. Kempson, Cancer Res. 44, 1265-1274(1984)). The DNAs were screened with probes originating in EWSR1. Theabnormal fragments were demonstrated with the probes PR.8 for Sten-1 andRR2 for SU-CCS-1 (as shown in FIG. 19). However, apparently normal DNAsfrom normal tissues were not accessible, and these abnormal bands, whichhave never been observed with DNA extracted from normal tissues,strongly suggest acquired somatic rearrangement (as shown in FIG. 16).

The RNAs originating in Sten-1, SU-CCS-1, and the primary STMM tumor5852/88 (J. A. Fletcher, Genes Chrom. Cancer 5, 184, 1992) have beencharacterized by northern blot with the 3' and 5' ends of the cDNA ofthe gene Ews. The normal 2.5 Kb transcript of the gene Ews wasdemonstrated with the two probes. However, in the three cases, eachprobe specifically revealed one additional band (probe 5', a clearlyexpressed 3 Kb transcript; probe 3', a diffuse 1.5 Kb transcript),suggesting that these abnormal transcripts could correspond to thefusion genes generated by the translocation t(12;22), as FIG. 20(b)shows.

2) Cloning of the Hybrid Transcript

In order to clone these transcripts, a method derived from the rapidamplification of the ends of cDNA was employed (A. F. Frohman, M. K.Dush, G. R. Martin, Proc. Natl. Acad. Sci. USA 85, 8998-9002, 1988; J.B. Dumas Milene Edwards, J. Delort, J. Mallet, Methods in MolecularBiology, Vol. 16, Chap. 35, 1992). PolyA RNAs of SU-CCS-1 and HeLa cellshave been reverse transcripted using a (dt)₁₄ oligonucleotide marked atits 5' end with an artificial sequence of 51 nucleotides (as indicatedabove). The cDNA sequences localized between this marking and exon 7 ofthe gene Ews were amplified by PCR, using a three-step procedure.Agarose gel electrophoresis revealed that the HeLa RNAs lead to theamplification of a unique fragment 1.4 Kb in size, which was identifiedby hybridization as corresponding to the normal cDNA of the gene Ews.Except for this fragment, the SU-CCS-1 RNA lead to the amplification ofa fragment of higher molecular weight as indicated in FIG. 20c. Thesequencing determined an alien open reading phase fused to codon 325 onthe 3' end of the eighth exon of the gene Ews. A search of the NBRF databank revealed that this sequence codes for the C-terminal part of thegene Atf-1, with a transcription factor dependent on cMPA. At the levelof the nucleic sequence, this sequence is identical to the 3' end of thecDNA sequence of the gene Atf-1 (T. Hai, F. Liu, W. J. Coukos, M. R.Green, Genes Dev. 3, 2083-2090, 1989; T. Yoshimura, J. -L. Fijisawa, M.Yoshida, EMBA J. 9, 2537-2542, 1990). A probe originating in this end,named ATF-1 3', hybridizes with the abnormal 3 Kb transcript observed bynorthern blot.

A more direct procedure for testing the presence of a fusion transcriptEws/Atf-1 in the tumor RNA was developed from a specific oligonucleotidederiving from the non-translated 3' region of the gene Atf-1. Thisprimer was used, in combination with an oligonucleotide homologous toexon 7 of the gene Ews, for PCR amplification of cDNAs primed with anoligo-dT synthesized from three known primary STMM tumors, of theSU-CCS-1 cell line and control HeLa cells. Except for the HeLa RNA,which did not lead to any amplification, the four RNAs of the STMM caseslead to a major 1 Kb fragment (as shown in FIG. 21). For each case, thesequencing revealed the same junction in phase, coming between codon 325of the gene Ews and codon 65 of the gene Atf-1. (as shown in FIG. 21).The fusion protein deduced as being coded by this transcript haspreserved the entirety of the N-terminal domain of the protein EWS andthe major portion of the protein ATF-1 (as shown in FIG. 23).

3) Mapping of the Gene Atf-1 on Chromosome 12

The portion of the fusion transcript corresponding to the gene Atf-1 wasused to retrieve a cosmid from the human genome bank. It wasdemonstrated by fluorescent in situ hybridization (FISH) on chromosomesin metaphase that this cosmid covers exclusively the band 12q13, thusdemonstrating that the gene Atf-1 is localized in this region of thegenome. The complexity of the karyotype of the SU-CCS-1 cells rules outa simple formal cytogenetic demonstration of the appearance of atranslocation involving chromosomes 12 and 22. Nevertheless, bicolorFISH analysis of the SU-CCS-1 cell nucleii in interphase has shown thatthe locus of the gene Ews was divided, and that the proximal portion ofthe gene Ews was juxtaposed with the locus of the gene Atf-1.

4) Reciprocal Fusion Gene

Transcription of the normal gene Atf-1 on chromosome 12 and that of thereciprocal fusion gene generated in chromosome der(12) have been studiedby reverse transcriptase PCR. In all cases of STMM, the two transcriptswere expressed; this result is compatible with the wide spectrum ofexpression of the gene Atf-1 (T. Yoshimura, J. -I. Fijisawa, M. Yoshida,EMBO J. 9, 2537-2542, 1990). In all cases, the amplified productsoriginating from the reciprocal fusion transcript are identical in sizebut shorter than anticipated. Sequencing the junction region of theSten-1 tumor cDNA revealed an out-of-phase fusion caused by the deletionor splicing of the ninth exon of the gene Ews. The deduced translationproduct is a truncated protein composed of the first 65 amino acids ofthe protein ATF-1; followed by three aberrant amino acids (as shown inFIG. 23).

IV--Discussion

The present study demonstrates that the translocation t(12;22)associated with STMM generates hybrid genes associating part of the geneEws and part of the gene Atf-1, whose structural and functionalcharacteristics resemble those of the fusion of the genes Ews andHum-Fli-1 in the translocation t(11;22) which associates it with Ewing'ssarcoma. In both cases, the chimera gene generated on chromosome der(22)codes for a protein in which the same portion of the N-terminal portionof the protein EWS is linked to a DNA fixation domain of a transcriptionfactor. It has thus been demonstrated in the models that the preservedportion of the N-terminal portion of the protein EWS, when it is linkedto the DNA fixation domain of the proteins HUM-FLI-1, ETS-1 or GAL-4,led to the transcription of specific reporter genes containing, in theirpromoter region, the corresponding response elements.

The hybrid protein deduced from the fusion DNA sequence of the genes Ewsand Atf-1 contains the major portion of the protein ATF-1, of which oneimportant functional domain is bZIP. This domain is known to mediate thedimerization of the protein and the DNA fixation. As a consequence it ispossible to consider that these two properties are observed in thehybrid protein. That should accordingly be capable of forming homodimersand heterodimers with the transcription factor CREB known to interactwith normal ATF-1 protein (C. Turc-Carel, A. Aurias, F. Mugneret, I.Lizard Sidaner, C. Volk, J. -P. Thiery, S. Olschwang, T. Philip, G. M.Lenoir, A. Mazabraud, Cancer Genet. Cytogenet. 32, 229-238, 1988; E. C.Douglass, M. Valentine, A. A. Green, F. A. Hayes, E. I. Thompson, J.Nat. Cancer Inst. 77, 1211-1213, 1988). It is also possible for thepatterns recognized by the ATF-1 protein to be fixed to the DNA.However, the chimera protein has lost one phosphorylation consensus siteof the protein kinase A, which can contribute to regulating thetranscription activity of the gene Atf-1 by cMPA (T. Yoshimura, J.-I.Fijisawa, M. Yoshida, EMBO J. 9, 2537-2542, 1990; K. J. Flink, N. C.Jones, Oncogene 6, 2019-2026, 1991).

The chimera protein including part of the protein EWS and part of theprotein ATF-1, which potentially has the Ews transactivator domainlinked with the domain bZIP, which is no longer regulated by cMPA, ofATF-1, can alter the regulation of the transcription of genes normallycontrolled by ATF-1.

In the majority of STMM cases, the two transcript hybrids are generatedby a single cytogenetic translocation, suggesting that the transcriptionof the gene Atf-1 is identical to that of the gene Ews, from thecentromere to the telomere. Both are expressed. This situation hadalready been observed in several malignant hematologic tumors. In fact,it has been demonstrated that the chromosomes of the translocationt(15;17) associated with promyelotic leukemia (D. C. Tkachuk, C. Kohler,M. L. Cleary, Cell 71, 691-700, 1992), those of the translocationt(14;11) associated with acute lymphocytic leukemias in the child (Y.Gu, T. Nakamura, H. Alder, R, Prasad, O. Canaani, G. Cimino, C. M.Groce, R. Canaani, Cell 71, 701-708, 1992) in each case express aberrantfusion transcripts. In STMM, the expression of a reciprocal fusion geneon the chromosome der(12) is compatible with the generally knownexpression of ATF-1. Nevertheless, because of the external fusion frameof the two coding sequences, its deduced expression product is nearlyentirely made up of the first 65 amino acids of the n-terminal region ofthe protein ATF-1. The truncated ATF-1 protein accordingly would neverhave to form dimers or to link DNA. The contribution of this truncatedprotein to the phenotype of the tumor remains obscure. Nevertheless, itmust not be essential to the proliferation of the tumor, since on oneoccasion the chromosome der(12) has a deletion in 30% of the tumor cells(G. Stenma, L-G Kindblom, L. Angervall, Genes Chrom. Cancer 4, 122-127,1992). In the case of Ewing's sarcoma, reciprocal fusion is notexpressed at a measureable level, and the chromosome der(11) is deletedonly in some cases (C. Turc-Carel, A. Aurias, F. Mugneret, I. LizardSidaner, C. Volk, J. -P. Thiery, S. Olschwang, T. Philip, G. M. Lenoir,Cancer Genet. Cytogenet. 32, 229-238, 1988; A. C. Douglass, M.Valentine, A. A. Green, F. A. Hayes, E. I. Thompson, J. Nat. CancerInst., 1211-1213, 1986).

In the case of acute lymphocytic leukemia, the N-terminal transactivatordomain of E2A can be linked to either the PBXl domain or the bZIP domainof HLF (T. Inaba, W. M. Roberts, L. H. Shapiro, K. W. Jolly, S. C.Raimondi, S. D. Smith, A. T. Look, Science 257, 521-534, 1992; S. P.Hunger, K. Ohyashiki, K. Toyama, M. L. Cleary, Genes Develop. 6,1608-1620, 1992). The present study demonstrates a similar mode ofoncogenetic conversion of solid tumors. In fact, the N-terminaltransactivator domain of EWS can be fused to the DNA fixation domain ofdifferent families of transcription factors: the domain ETS in the caseof Ewing's sarcoma, the domain bZIP in the case of STMM. This suggests acommon oncogenetic mechanism mediated by the N-terminal domain of EWS,contained in both the chimera proteins EWS/HUM-FLI-1 and EWS/ATF-1.

    __________________________________________________________________________    #             SEQUENCE LISTING                                                  - -  - - (1) GENERAL INFORMATION:                                             - -    (iii) NUMBER OF SEQUENCES: 129                                         - -  - - (2) INFORMATION FOR SEQ ID NO:1:                                     - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 2371 base - #pairs                                                (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 25..1992                                               - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1:                               - - GAGAACGAGG AGGAAGGAGA GAAA ATG GCG TCC ACG GAT T - #AC AGT ACC TAT            51                                                                                         - #         Met Ala Ser Thr Asp - #Tyr Ser Thr Tyr                            - #           1       - #        5                           - - AGC CAA GCT GCA GCG CAG CAG GGC TAC AGT GC - #T TAC ACC GCC CAG CCC           99                                                                       Ser Gln Ala Ala Ala Gln Gln Gly Tyr Ser Al - #a Tyr Thr Ala Gln Pro            10                 - # 15                 - # 20                 - # 25       - - ACT CAA GGA TAT GCA CAG ACC ACC CAG GCA TA - #T GGG CAA CAA AGC TAT          147                                                                       Thr Gln Gly Tyr Ala Gln Thr Thr Gln Ala Ty - #r Gly Gln Gln Ser Tyr                            30 - #                 35 - #                 40              - - GGA ACC TAT GGA CAG CCC ACT GAT GTC AGC TA - #T ACC CAG GCT CAG ACC          195                                                                       Gly Thr Tyr Gly Gln Pro Thr Asp Val Ser Ty - #r Thr Gln Ala Gln Thr                        45     - #             50     - #             55                  - - ACT GCA ACC TAT GGG CAG ACC GCC TAT GCA AC - #T TCT TAT GGA CAG CCT          243                                                                       Thr Ala Thr Tyr Gly Gln Thr Ala Tyr Ala Th - #r Ser Tyr Gly Gln Pro                    60         - #         65         - #         70                      - - CCC ACT GGT TAT ACT ACT CCA ACT GCC CCC CA - #G GCA TAC AGC CAG CCT          291                                                                       Pro Thr Gly Tyr Thr Thr Pro Thr Ala Pro Gl - #n Ala Tyr Ser Gln Pro                75             - #     80             - #     85                          - - GTC CAG GGG TAT GGC ACT GGT GCT TAT GAT AC - #C ACC ACT GCT ACA GTC          339                                                                       Val Gln Gly Tyr Gly Thr Gly Ala Tyr Asp Th - #r Thr Thr Ala Thr Val            90                 - # 95                 - #100                 - #105       - - ACC ACC ACC CAG GCC TCC TAT GCA GCT CAG TC - #T GCA TAT GGC ACT CAG          387                                                                       Thr Thr Thr Gln Ala Ser Tyr Ala Ala Gln Se - #r Ala Tyr Gly Thr Gln                           110  - #               115  - #               120              - - CCT GCT TAT CCA GCC TAT GGG CAG CAG CCA GC - #A GCC ACT GCA CCT ACA          435                                                                       Pro Ala Tyr Pro Ala Tyr Gly Gln Gln Pro Al - #a Ala Thr Ala Pro Thr                       125      - #           130      - #           135                  - - AGA CCG CAG GAT GGA AAC AAG CCC ACT GAG AC - #T AGT CAA CCT CAA TCT          483                                                                       Arg Pro Gln Asp Gly Asn Lys Pro Thr Glu Th - #r Ser Gln Pro Gln Ser                   140          - #       145          - #       150                      - - AGC ACA GGG GGT TAC AAC CAG CCC AGC CTA GG - #A TAT GGA CAG AGT AAC          531                                                                       Ser Thr Gly Gly Tyr Asn Gln Pro Ser Leu Gl - #y Tyr Gly Gln Ser Asn               155              - #   160              - #   165                          - - TAC AGT TAT CCC CAG GTA CCT GGG AGC TAC CC - #C ATG CAG CCA GTC ACT          579                                                                       Tyr Ser Tyr Pro Gln Val Pro Gly Ser Tyr Pr - #o Met Gln Pro Val Thr           170                 1 - #75                 1 - #80                 1 -      #85                                                                              - - GCA CCT CCA TCC TAC CCT CCT ACC AGC TAT TC - #C TCT ACA CAG CCG        ACT      627                                                                    Ala Pro Pro Ser Tyr Pro Pro Thr Ser Tyr Se - #r Ser Thr Gln Pro Thr                          190  - #               195  - #               200              - - AGT TAT GAT CAG AGC AGT TAC TCT CAG CAG AA - #C ACC TAT GGG CAA CCG          675                                                                       Ser Tyr Asp Gln Ser Ser Tyr Ser Gln Gln As - #n Thr Tyr Gly Gln Pro                       205      - #           210      - #           215                  - - AGC AGC TAT GGA CAG CAG AGT AGC TAT GGT CA - #A CAA AGC AGC TAT GGG          723                                                                       Ser Ser Tyr Gly Gln Gln Ser Ser Tyr Gly Gl - #n Gln Ser Ser Tyr Gly                   220          - #       225          - #       230                      - - CAG CAG CCT CCC ACT AGT TAC CCA CCC CAA AC - #T GGA TCC TAC AGC CAA          771                                                                       Gln Gln Pro Pro Thr Ser Tyr Pro Pro Gln Th - #r Gly Ser Tyr Ser Gln               235              - #   240              - #   245                          - - GCT CCA AGT CAA TAT AGC CAA CAG AGC AGC AG - #C TAC GGG CAG CAG AGT          819                                                                       Ala Pro Ser Gln Tyr Ser Gln Gln Ser Ser Se - #r Tyr Gly Gln Gln Ser           250                 2 - #55                 2 - #60                 2 -      #65                                                                              - - TCA TTC CGA CAG GAC CAC CCC AGT AGC ATG GG - #T GTT TAT GGG CAG        GAG      867                                                                    Ser Phe Arg Gln Asp His Pro Ser Ser Met Gl - #y Val Tyr Gly Gln Glu                          270  - #               275  - #               280              - - TCT GGA GGA TTT TCC GGA CCA GGA GAG AAC CG - #G AGC ATG AGT GGC CCT          915                                                                       Ser Gly Gly Phe Ser Gly Pro Gly Glu Asn Ar - #g Ser Met Ser Gly Pro                       285      - #           290      - #           295                  - - GAT AAC CGG GGC AGG GGA AGA GGG GGA TTT GA - #T CGT GGA GGC ATG AGC          963                                                                       Asp Asn Arg Gly Arg Gly Arg Gly Gly Phe As - #p Arg Gly Gly Met Ser                   300          - #       305          - #       310                      - - AGA GGT GGG CGG GGA GGA GGA CGC GGT GGA AT - #G GGC AGC GCT GGA GAG         1011                                                                       Arg Gly Gly Arg Gly Gly Gly Arg Gly Gly Me - #t Gly Ser Ala Gly Glu               315              - #   320              - #   325                          - - CGA GGT GGC TTC AAT AAG CCT GGT GGA CCC AT - #G GAT GAA GGA CCA GAT         1059                                                                       Arg Gly Gly Phe Asn Lys Pro Gly Gly Pro Me - #t Asp Glu Gly Pro Asp           330                 3 - #35                 3 - #40                 3 -      #45                                                                              - - CTT GAT CTA GGC CCT CCT GTA GAT CCA GAT GA - #A GAC TCT GAC AAC        AGT     1107                                                                    Leu Asp Leu Gly Pro Pro Val Asp Pro Asp Gl - #u Asp Ser Asp Asn Ser                          350  - #               355  - #               360              - - GCA ATT TAT GTA CAA GGA TTA AAT GAC AGT GT - #G ACT CTA GAT GAT CTG         1155                                                                       Ala Ile Tyr Val Gln Gly Leu Asn Asp Ser Va - #l Thr Leu Asp Asp Leu                       365      - #           370      - #           375                  - - GCA GAC TTC TTT AAG CAG TGT GGG GTT GTT AA - #G ATG AAC AAG AGA ACT         1203                                                                       Ala Asp Phe Phe Lys Gln Cys Gly Val Val Ly - #s Met Asn Lys Arg Thr                   380          - #       385          - #       390                      - - GGG CAA CCC ATG ATC CAC ATC TAC CTG GAC AA - #G GAA ACA GGA AAG CCC         1251                                                                       Gly Gln Pro Met Ile His Ile Tyr Leu Asp Ly - #s Glu Thr Gly Lys Pro               395              - #   400              - #   405                          - - AAA GGC GAT GCC ACA GTG TCC TAT GAA GAC CC - #A CCC ACT GCC AAG GCT         1299                                                                       Lys Gly Asp Ala Thr Val Ser Tyr Glu Asp Pr - #o Pro Thr Ala Lys Ala           410                 4 - #15                 4 - #20                 4 -      #25                                                                              - - GCC GTG GAA TGG TTT GAT GGG AAA GAT TTT CA - #A GGG AGC AAA CTT        AAA     1347                                                                    Ala Val Glu Trp Phe Asp Gly Lys Asp Phe Gl - #n Gly Ser Lys Leu Lys                          430  - #               435  - #               440              - - GTC TCC CTT GCT CGG AAG AAG CCT CCA ATG AA - #C AGT ATG CGG GGT GGT         1395                                                                       Val Ser Leu Ala Arg Lys Lys Pro Pro Met As - #n Ser Met Arg Gly Gly                       445      - #           450      - #           455                  - - CTG CCA CCC CGT GAG GGC AGA GGC ATG CCA CC - #A CCA CTC CGT GGA GGT         1443                                                                       Leu Pro Pro Arg Glu Gly Arg Gly Met Pro Pr - #o Pro Leu Arg Gly Gly                   460          - #       465          - #       470                      - - CCA GGA GGC CCA GGA GGT CCT GGG GGA CCC AT - #G GGT CGC ATG GGA GGC         1491                                                                       Pro Gly Gly Pro Gly Gly Pro Gly Gly Pro Me - #t Gly Arg Met Gly Gly               475              - #   480              - #   485                          - - CGT GGA GGA GAT AGA GGA GGC TTC CCT CCA AG - #A GGA CCC CGG GGT TCC         1539                                                                       Arg Gly Gly Asp Arg Gly Gly Phe Pro Pro Ar - #g Gly Pro Arg Gly Ser           490                 4 - #95                 5 - #00                 5 -      #05                                                                              - - CGA GGG AAC CCC TCT GGA GGA GGA AAC GTC CA - #G CAC CGA GCT GGA        GAC     1587                                                                    Arg Gly Asn Pro Ser Gly Gly Gly Asn Val Gl - #n His Arg Ala Gly Asp                          510  - #               515  - #               520              - - TGG CAG TGT CCC AAT CCG GGT TGT GGA AAC CA - #G AAC TTC GCC TGG AGA         1635                                                                       Trp Gln Cys Pro Asn Pro Gly Cys Gly Asn Gl - #n Asn Phe Ala Trp Arg                       525      - #           530      - #           535                  - - ACA GAG TGC AAC CAG TGT AAG GCC CCA AAG CC - #T GAA GGC TTC CTC CCG         1683                                                                       Thr Glu Cys Asn Gln Cys Lys Ala Pro Lys Pr - #o Glu Gly Phe Leu Pro                   540          - #       545          - #       550                      - - CCA CCC TTT CCG CCC CCG GGT GGT GAT CGT GG - #C AGA GGT GGC CCT GGT         1731                                                                       Pro Pro Phe Pro Pro Pro Gly Gly Asp Arg Gl - #y Arg Gly Gly Pro Gly               555              - #   560              - #   565                          - - GGC ATG CGG GGA GGA AGA GGT GGC CTC ATG GA - #T CGT GGT GGT CCC GGT         1779                                                                       Gly Met Arg Gly Gly Arg Gly Gly Leu Met As - #p Arg Gly Gly Pro Gly           570                 5 - #75                 5 - #80                 5 -      #85                                                                              - - GGA ATG TTC AGA GGT GGC CGT GGT GGA GAC AG - #A GGT GGC TTC CGT        GGT     1827                                                                    Gly Met Phe Arg Gly Gly Arg Gly Gly Asp Ar - #g Gly Gly Phe Arg Gly                          590  - #               595  - #               600              - - GGC CGG GGC ATG GAC CGA GGT GGC TTT GGT GG - #A GGA AGA CGA GGT GGC         1875                                                                       Gly Arg Gly Met Asp Arg Gly Gly Phe Gly Gl - #y Gly Arg Arg Gly Gly                       605      - #           610      - #           615                  - - CCT GGG GGG CCC CCT GGA CCT TTG ATG GAA CA - #G ATG GGA GGA AGA AGA         1923                                                                       Pro Gly Gly Pro Pro Gly Pro Leu Met Glu Gl - #n Met Gly Gly Arg Arg                   620          - #       625          - #       630                      - - GGA GGA CGT GGA GGA CCT GGA AAA ATG GAT AA - #A GGC GAG CAC CGT CAG         1971                                                                       Gly Gly Arg Gly Gly Pro Gly Lys Met Asp Ly - #s Gly Glu His Arg Gln               635              - #   640              - #   645                          - - GAG CGC AGA GAT CGG CCC TAC TAGATGCAGA GACCCCGCA - #G AGCTGCATTG            2022                                                                       Glu Arg Arg Asp Arg Pro Tyr                                                   650                 6 - #55                                                    - - ACTACCAGAT TTATTTTTTA AACCAGAAAA TGTTTTAAAT TTATAATTCC AT -             #ATTTATAA   2082                                                                 - - TGTTGGCCAC AACATTATGA TTATTCCTTG TCTGTACTTT AGTATTTTTC AC -            #CATTTGTG   2142                                                                 - - AAGAAACATT AAAACAAGTT AAATGGTAGT GTGCGGAGTT TTTTTTTCTT CC -            #TTCTTTTA   2202                                                                 - - AAAATGGTTG TTTAAGACTT TAACAATGGG AACCCCTTGT GAGCATGCTC AG -            #TATCATTG   2262                                                                 - - TGGAGAACCA AGAGGGCCTC TTAACTGTAA CAATGTTCAT GGTTGTGATG TT -            #TTTTTTTT   2322                                                                 - - TTTTTTAAAA TAAAATTCCA AATGTTTAAT AAAAAAAAAA AAAAAAAAA  - #                 2371                                                                        - -  - - (2) INFORMATION FOR SEQ ID NO:2:                                     - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 656 amino - #acids                                                (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2:                               - - Met Ala Ser Thr Asp Tyr Ser Thr Tyr Ser Gl - #n Ala Ala Ala Gln Gln        1               5 - #                 10 - #                 15              - - Gly Tyr Ser Ala Tyr Thr Ala Gln Pro Thr Gl - #n Gly Tyr Ala Gln Thr                   20     - #             25     - #             30                  - - Thr Gln Ala Tyr Gly Gln Gln Ser Tyr Gly Th - #r Tyr Gly Gln Pro Thr               35         - #         40         - #         45                      - - Asp Val Ser Tyr Thr Gln Ala Gln Thr Thr Al - #a Thr Tyr Gly Gln Thr           50             - #     55             - #     60                          - - Ala Tyr Ala Thr Ser Tyr Gly Gln Pro Pro Th - #r Gly Tyr Thr Thr Pro       65                 - # 70                 - # 75                 - # 80       - - Thr Ala Pro Gln Ala Tyr Ser Gln Pro Val Gl - #n Gly Tyr Gly Thr Gly                       85 - #                 90 - #                 95              - - Ala Tyr Asp Thr Thr Thr Ala Thr Val Thr Th - #r Thr Gln Ala Ser Tyr                  100      - #           105      - #           110                  - - Ala Ala Gln Ser Ala Tyr Gly Thr Gln Pro Al - #a Tyr Pro Ala Tyr Gly              115          - #       120          - #       125                      - - Gln Gln Pro Ala Ala Thr Ala Pro Thr Arg Pr - #o Gln Asp Gly Asn Lys          130              - #   135              - #   140                          - - Pro Thr Glu Thr Ser Gln Pro Gln Ser Ser Th - #r Gly Gly Tyr Asn Gln      145                 1 - #50                 1 - #55                 1 -      #60                                                                              - - Pro Ser Leu Gly Tyr Gly Gln Ser Asn Tyr Se - #r Tyr Pro Gln Val        Pro                                                                                             165  - #               170  - #               175             - - Gly Ser Tyr Pro Met Gln Pro Val Thr Ala Pr - #o Pro Ser Tyr Pro Pro                  180      - #           185      - #           190                  - - Thr Ser Tyr Ser Ser Thr Gln Pro Thr Ser Ty - #r Asp Gln Ser Ser Tyr              195          - #       200          - #       205                      - - Ser Gln Gln Asn Thr Tyr Gly Gln Pro Ser Se - #r Tyr Gly Gln Gln Ser          210              - #   215              - #   220                          - - Ser Tyr Gly Gln Gln Ser Ser Tyr Gly Gln Gl - #n Pro Pro Thr Ser Tyr      225                 2 - #30                 2 - #35                 2 -      #40                                                                              - - Pro Pro Gln Thr Gly Ser Tyr Ser Gln Ala Pr - #o Ser Gln Tyr Ser        Gln                                                                                             245  - #               250  - #               255             - - Gln Ser Ser Ser Tyr Gly Gln Gln Ser Ser Ph - #e Arg Gln Asp His Pro                  260      - #           265      - #           270                  - - Ser Ser Met Gly Val Tyr Gly Gln Glu Ser Gl - #y Gly Phe Ser Gly Pro              275          - #       280          - #       285                      - - Gly Glu Asn Arg Ser Met Ser Gly Pro Asp As - #n Arg Gly Arg Gly Arg          290              - #   295              - #   300                          - - Gly Gly Phe Asp Arg Gly Gly Met Ser Arg Gl - #y Gly Arg Gly Gly Gly      305                 3 - #10                 3 - #15                 3 -      #20                                                                              - - Arg Gly Gly Met Gly Ser Ala Gly Glu Arg Gl - #y Gly Phe Asn Lys        Pro                                                                                             325  - #               330  - #               335             - - Gly Gly Pro Met Asp Glu Gly Pro Asp Leu As - #p Leu Gly Pro Pro Val                  340      - #           345      - #           350                  - - Asp Pro Asp Glu Asp Ser Asp Asn Ser Ala Il - #e Tyr Val Gln Gly Leu              355          - #       360          - #       365                      - - Asn Asp Ser Val Thr Leu Asp Asp Leu Ala As - #p Phe Phe Lys Gln Cys          370              - #   375              - #   380                          - - Gly Val Val Lys Met Asn Lys Arg Thr Gly Gl - #n Pro Met Ile His Ile      385                 3 - #90                 3 - #95                 4 -      #00                                                                              - - Tyr Leu Asp Lys Glu Thr Gly Lys Pro Lys Gl - #y Asp Ala Thr Val        Ser                                                                                             405  - #               410  - #               415             - - Tyr Glu Asp Pro Pro Thr Ala Lys Ala Ala Va - #l Glu Trp Phe Asp Gly                  420      - #           425      - #           430                  - - Lys Asp Phe Gln Gly Ser Lys Leu Lys Val Se - #r Leu Ala Arg Lys Lys              435          - #       440          - #       445                      - - Pro Pro Met Asn Ser Met Arg Gly Gly Leu Pr - #o Pro Arg Glu Gly Arg          450              - #   455              - #   460                          - - Gly Met Pro Pro Pro Leu Arg Gly Gly Pro Gl - #y Gly Pro Gly Gly Pro      465                 4 - #70                 4 - #75                 4 -      #80                                                                              - - Gly Gly Pro Met Gly Arg Met Gly Gly Arg Gl - #y Gly Asp Arg Gly        Gly                                                                                             485  - #               490  - #               495             - - Phe Pro Pro Arg Gly Pro Arg Gly Ser Arg Gl - #y Asn Pro Ser Gly Gly                  500      - #           505      - #           510                  - - Gly Asn Val Gln His Arg Ala Gly Asp Trp Gl - #n Cys Pro Asn Pro Gly              515          - #       520          - #       525                      - - Cys Gly Asn Gln Asn Phe Ala Trp Arg Thr Gl - #u Cys Asn Gln Cys Lys          530              - #   535              - #   540                          - - Ala Pro Lys Pro Glu Gly Phe Leu Pro Pro Pr - #o Phe Pro Pro Pro Gly      545                 5 - #50                 5 - #55                 5 -      #60                                                                              - - Gly Asp Arg Gly Arg Gly Gly Pro Gly Gly Me - #t Arg Gly Gly Arg        Gly                                                                                             565  - #               570  - #               575             - - Gly Leu Met Asp Arg Gly Gly Pro Gly Gly Me - #t Phe Arg Gly Gly Arg                  580      - #           585      - #           590                  - - Gly Gly Asp Arg Gly Gly Phe Arg Gly Gly Ar - #g Gly Met Asp Arg Gly              595          - #       600          - #       605                      - - Gly Phe Gly Gly Gly Arg Arg Gly Gly Pro Gl - #y Gly Pro Pro Gly Pro          610              - #   615              - #   620                          - - Leu Met Glu Gln Met Gly Gly Arg Arg Gly Gl - #y Arg Gly Gly Pro Gly      625                 6 - #30                 6 - #35                 6 -      #40                                                                              - - Lys Met Asp Lys Gly Glu His Arg Gln Glu Ar - #g Arg Asp Arg Pro        Tyr                                                                                             645  - #               650  - #               655             - -  - - (2) INFORMATION FOR SEQ ID NO:3:                                     - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 2938 base - #pairs                                                (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 143..1498                                              - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3:                               - - GGAGGGCGCT CGCAGGGGGC ACGCAGGGAG GGCCCAGGGC GCCAGGGAGG CC -             #GCGCCGGG     60                                                                 - - CTAATCCGAA GGGGCTGCGA GGTCAGGCTG TAACCGGGTC AATGTGTGGA AT -            #ATTGGGGG    120                                                                 - - GCTCGGCTGC AGACTTGGCC AA ATG GAC GGG ACT ATT AAG - # GAG GCT CTG       TCG     172                                                                                       - #       Met Asp Gly Thr Ile Lys G - #lu Ala Leu Ser                        - #         1         - #      5            - #            10                                                                               - - GTG GTG AGC GAC GAC CAG TCC CTC TTT GAC TC - #A GCG TAC GGA GCG        GCA      220                                                                    Val Val Ser Asp Asp Gln Ser Leu Phe Asp Se - #r Ala Tyr Gly Ala Ala                           15 - #                 20 - #                 25              - - GCC CAT CTC CCC AAG GCC GAC ATG ACT GCC TC - #G GGG AGT CCT GAC TAC          268                                                                       Ala His Leu Pro Lys Ala Asp Met Thr Ala Se - #r Gly Ser Pro Asp Tyr                        30     - #             35     - #             40                  - - GGG CAG CCC CAC AAG ATC AAC CCC CTC CCA CC - #A CAG CAG GAG TGG ATC          316                                                                       Gly Gln Pro His Lys Ile Asn Pro Leu Pro Pr - #o Gln Gln Glu Trp Ile                    45         - #         50         - #         55                      - - AAT CAG CCA GTG AGG GTC AAC GTC AAG CGG GA - #G TAT GAC CAC ATG AAT          364                                                                       Asn Gln Pro Val Arg Val Asn Val Lys Arg Gl - #u Tyr Asp His Met Asn                60             - #     65             - #     70                          - - GGA TCC AGG GAG TCT CCG GTG GAC TGC AGC GT - #T AGC AAA TGC AGC AAG          412                                                                       Gly Ser Arg Glu Ser Pro Val Asp Cys Ser Va - #l Ser Lys Cys Ser Lys            75                 - # 80                 - # 85                 - # 90       - - CTG GTG GGC GGA GGC GAG TCC AAC CCC ATG AA - #C TAC AAC AGC TAT ATG          460                                                                       Leu Val Gly Gly Gly Glu Ser Asn Pro Met As - #n Tyr Asn Ser Tyr Met                            95 - #                100 - #                105              - - GAC GAG AAG AAT GGC CCC CCT CCT CCC AAC AT - #G ACC ACC AAC GAG AGG          508                                                                       Asp Glu Lys Asn Gly Pro Pro Pro Pro Asn Me - #t Thr Thr Asn Glu Arg                       110      - #           115      - #           120                  - - AGA GTC ATC GTC CCC GCA GAC CCC ACA CTG TG - #G ACA CAG GAG CAT GTG          556                                                                       Arg Val Ile Val Pro Ala Asp Pro Thr Leu Tr - #p Thr Gln Glu His Val                   125          - #       130          - #       135                      - - AGG CAA TGG CTG GAG TGG GCC ATA AAG GAG TA - #T AGC TTG ATG GAG ATC          604                                                                       Arg Gln Trp Leu Glu Trp Ala Ile Lys Glu Ty - #r Ser Leu Met Glu Ile               140              - #   145              - #   150                          - - GAC ACA TCC TTT TTC CAG AAC ATG GAT GGC AA - #G GAA CTG TGT AAA ATG          652                                                                       Asp Thr Ser Phe Phe Gln Asn Met Asp Gly Ly - #s Glu Leu Cys Lys Met           155                 1 - #60                 1 - #65                 1 -      #70                                                                              - - AAC AAG GAG GAC TTC CTC CGC GCC ACC ACC CT - #C TAC AAC ACG GAA        GTG      700                                                                    Asn Lys Glu Asp Phe Leu Arg Ala Thr Thr Le - #u Tyr Asn Thr Glu Val                          175  - #               180  - #               185              - - CTG TTG TCA CAC CTC AGT TAC CTC AGG GAA AG - #T TCA CTG CTG GCC TAT          748                                                                       Leu Leu Ser His Leu Ser Tyr Leu Arg Glu Se - #r Ser Leu Leu Ala Tyr                       190      - #           195      - #           200                  - - AAT ACA ACC TCC CAC ACC GAC CAA TCC TCA CG - #A TTG AGT GTC AAA GAA          796                                                                       Asn Thr Thr Ser His Thr Asp Gln Ser Ser Ar - #g Leu Ser Val Lys Glu                   205          - #       210          - #       215                      - - GAC CCT TCT TAT GAC TCA GTC AGA AGA GGA GC - #A TGG GGC AAT AAC ATG          844                                                                       Asp Pro Ser Tyr Asp Ser Val Arg Arg Gly Al - #a Trp Gly Asn Asn Met               220              - #   225              - #   230                          - - AAT TCT GGC CTC AAC AAA AGT CCT CCC CTT GG - #A GGG GCA CAA ACG ATC          892                                                                       Asn Ser Gly Leu Asn Lys Ser Pro Pro Leu Gl - #y Gly Ala Gln Thr Ile           235                 2 - #40                 2 - #45                 2 -      #50                                                                              - - AGT AAG AAT ACA GAG CAA CGG CCC CAG CCA GA - #T CCG TAT CAG ATC        CTG      940                                                                    Ser Lys Asn Thr Glu Gln Arg Pro Gln Pro As - #p Pro Tyr Gln Ile Leu                          255  - #               260  - #               265              - - GGC CCG ACC AGC AGT CGC CTA GCC AAC CCT GG - #A AGC GGG CAG ATC CAG          988                                                                       Gly Pro Thr Ser Ser Arg Leu Ala Asn Pro Gl - #y Ser Gly Gln Ile Gln                       270      - #           275      - #           280                  - - CTG TGG CAA TTC CTC CTG GAG CTG CTC TCC GA - #C AGC GCC AAC GCC AGC         1036                                                                       Leu Trp Gln Phe Leu Leu Glu Leu Leu Ser As - #p Ser Ala Asn Ala Ser                   285          - #       290          - #       295                      - - TGT ATC ACC TGG GAG GGG ACC AAC GGG GAG TT - #C AAA ATG ACG GAC CCC         1084                                                                       Cys Ile Thr Trp Glu Gly Thr Asn Gly Glu Ph - #e Lys Met Thr Asp Pro               300              - #   305              - #   310                          - - GAT GAG GTG GCC AGG CGC TGG GGC GAG CGG AA - #A AGC AAG CCC AAC ATG         1132                                                                       Asp Glu Val Ala Arg Arg Trp Gly Glu Arg Ly - #s Ser Lys Pro Asn Met           315                 3 - #20                 3 - #25                 3 -      #30                                                                              - - AAT TAC GAC AAG CTG AGC CGG GCC CTC CGT TA - #T TAC TAT GAT AAA        AAC     1180                                                                    Asn Tyr Asp Lys Leu Ser Arg Ala Leu Arg Ty - #r Tyr Tyr Asp Lys Asn                          335  - #               340  - #               345              - - ATT ATG ACC AAA GTG CAC GGC AAA AGA TAT GC - #T TAC AAA TTT GAC TTC         1228                                                                       Ile Met Thr Lys Val His Gly Lys Arg Tyr Al - #a Tyr Lys Phe Asp Phe                       350      - #           355      - #           360                  - - CAC GGC ATT GCC CAG GCT CTG CAG CCA CAT CC - #G ACC GAG TCG TCC ATG         1276                                                                       His Gly Ile Ala Gln Ala Leu Gln Pro His Pr - #o Thr Glu Ser Ser Met                   365          - #       370          - #       375                      - - TAC AAG TAC CCT TCT GAC ATC TCC TAC ATG CC - #T TCC TAC CAT GCC CAC         1324                                                                       Tyr Lys Tyr Pro Ser Asp Ile Ser Tyr Met Pr - #o Ser Tyr His Ala His               380              - #   385              - #   390                          - - CAG CAG AAG GTG AAC TTT GTC CCT CCC CAT CC - #A TCC TCC ATG CCT GTC         1372                                                                       Gln Gln Lys Val Asn Phe Val Pro Pro His Pr - #o Ser Ser Met Pro Val           395                 4 - #00                 4 - #05                 4 -      #10                                                                              - - ACT TCC TCC AGC TTC TTT GGA GCC GCA TCA CA - #A TAC TGG ACC TCC        CCC     1420                                                                    Thr Ser Ser Ser Phe Phe Gly Ala Ala Ser Gl - #n Tyr Trp Thr Ser Pro                          415  - #               420  - #               425              - - ACG GGG GGA ATC TAC CCC AAC CCC AAC GTC CC - #C CGC CAT CCT AAC ACC         1468                                                                       Thr Gly Gly Ile Tyr Pro Asn Pro Asn Val Pr - #o Arg His Pro Asn Thr                       430      - #           435      - #           440                  - - CAC GTG CCT TCA CAC TTA GGC AGC TAC TAC TA - #GAAGCTTA CTCATCAGTG           1518                                                                       His Val Pro Ser His Leu Gly Ser Tyr Tyr                                               445          - #       450                                             - - GCCTTCTAGC TGAAGCCCAT CCTGCACACT TACTGGATGC TTTGGACTCA AC -             #AGGACATA   1578                                                                 - - TGTGGCCTTG AAGGGAAGAC AAAACTGGAT GTTCTTTCTT GTTGGATAGA AC -            #CTTTGTAT   1638                                                                 - - TTGTTCTTTA AAAACATTTT TTTTAATGTT GGTAACTTTT GCTTCCTCTA CC -            #TGAACAAA   1698                                                                 - - GAGATGAATA ATTCCATGGG CCAGTATGCC AGTTTGAATT CTCAGTCTCC TA -            #GCATCTTG   1758                                                                 - - TGAGTTGCAT ATTAAGATTA CTGGAATGGT TAAGTCATGG TTCTGAGAAA GA -            #AGCTGTAC   1818                                                                 - - GTTTTCTTTA TGTTTTTATG ACCAAAGCAG TTTCTTGTCA ATACACGGGG TT -            #CAGTATGA   1878                                                                 - - CACAGAATCA TGGACTTAAC CCGTCATGTT CTGGTTTGAG ATTTAGTGAC AA -            #ATAGAGGT   1938                                                                 - - GGGAAGCTTA TAATCTAATT TTAGGAGGAC CAAATTCAGT GGATGGCAAC TG -            #GAACATTG   1998                                                                 - - ATTGTAAGGC CAGTGAAGTT TTCACCCAAC TGGAATTTGA TGGAAAGAAG GT -            #TTGTGTGT   2058                                                                 - - TTAAGACGCC AAGGGCATTG CAGAATCCCT CTCAGTGGAC AGTATGCACT CA -            #GCTGACCA   2118                                                                 - - CTCTCTCTAG AAATAGTCAA GATATGAACT AAGAAATTTT AATGCAAATA CA -            #TACATTCC   2178                                                                 - - TGAAAGACGG GGAATTAAAT TACTAATTTT TTTTTTTTTT TAAATGATGA CA -            #GTGGTCCC   2238                                                                 - - AGAACTTGGA AAAGTTGTAG GGATTTCTAA ACTCAAGCAG ATTCGCAAGT GC -            #TGTGCGCT   2298                                                                 - - TGTCAGACCA TCAGACCAGG GCCAACCAAT CAGAAGGCAA CTTACTGTAT AA -            #ATTATGCA   2358                                                                 - - GAGTTATTTT CCTATATCTC ACAGTATTAA AAATAAATAA TTAAAAATTA AG -            #AATAAATA   2418                                                                 - - AACGAGTTGA CCTCGGTCAC AAAAGCAGTT TTACTATCGA ATCAATCGCT GT -            #TATTTTTT   2478                                                                 - - TTAATGTAAT TTGTACATCT TTTTTCAATC TGTACATTTG GGCTGTCTGT AT -            #GTTTTTAT   2538                                                                 - - AGCTGGTTTT TAAAAAGCAT AATATGCCTA TAGCTGAAAA GGAAACAGGG CT -            #GTTTAAGT   2598                                                                 - - CACTGACTTA TGAGAAAGCA AAGCACTGGT ACAGTTATTT AACAGGCATA CA -            #CAAGCAGG   2658                                                                 - - GAAAGATAAT CCATTTAGAT CTTTAATGCT TTGGAAATGC GTGTAACAGT AC -            #TGCAATAA   2718                                                                 - - TCACAGCTCT GGGAAAAACA ACGAAACTTT CCCTTGTGGA GAGGAGGGAT TT -            #TCCTGCTC   2778                                                                 - - TATATAAGCA ACATATTTTT AGACATTAAA ATATATATAA TTTTGCAGGT AA -            #TTGTTGAC   2838                                                                 - - TTTTTTAACT ATATTAAGCG TTAAGCTGAC AACTGTCAAA GAAGACCATG TT -            #GTAAAATA   2898                                                                 - - ATTTGACTAA ATAAATGGTT CCTTCTCTCA AAAAAAAAAA     - #                      - #  2938                                                                     - -  - - (2) INFORMATION FOR SEQ ID NO:4:                                     - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 452 amino - #acids                                                (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4:                               - - Met Asp Gly Thr Ile Lys Glu Ala Leu Ser Va - #l Val Ser Asp Asp Gln        1               5 - #                 10 - #                 15              - - Ser Leu Phe Asp Ser Ala Tyr Gly Ala Ala Al - #a His Leu Pro Lys Ala                   20     - #             25     - #             30                  - - Asp Met Thr Ala Ser Gly Ser Pro Asp Tyr Gl - #y Gln Pro His Lys Ile               35         - #         40         - #         45                      - - Asn Pro Leu Pro Pro Gln Gln Glu Trp Ile As - #n Gln Pro Val Arg Val           50             - #     55             - #     60                          - - Asn Val Lys Arg Glu Tyr Asp His Met Asn Gl - #y Ser Arg Glu Ser Pro       65                 - # 70                 - # 75                 - # 80       - - Val Asp Cys Ser Val Ser Lys Cys Ser Lys Le - #u Val Gly Gly Gly Glu                       85 - #                 90 - #                 95              - - Ser Asn Pro Met Asn Tyr Asn Ser Tyr Met As - #p Glu Lys Asn Gly Pro                  100      - #           105      - #           110                  - - Pro Pro Pro Asn Met Thr Thr Asn Glu Arg Ar - #g Val Ile Val Pro Ala              115          - #       120          - #       125                      - - Asp Pro Thr Leu Trp Thr Gln Glu His Val Ar - #g Gln Trp Leu Glu Trp          130              - #   135              - #   140                          - - Ala Ile Lys Glu Tyr Ser Leu Met Glu Ile As - #p Thr Ser Phe Phe Gln      145                 1 - #50                 1 - #55                 1 -      #60                                                                              - - Asn Met Asp Gly Lys Glu Leu Cys Lys Met As - #n Lys Glu Asp Phe        Leu                                                                                             165  - #               170  - #               175             - - Arg Ala Thr Thr Leu Tyr Asn Thr Glu Val Le - #u Leu Ser His Leu Ser                  180      - #           185      - #           190                  - - Tyr Leu Arg Glu Ser Ser Leu Leu Ala Tyr As - #n Thr Thr Ser His Thr              195          - #       200          - #       205                      - - Asp Gln Ser Ser Arg Leu Ser Val Lys Glu As - #p Pro Ser Tyr Asp Ser          210              - #   215              - #   220                          - - Val Arg Arg Gly Ala Trp Gly Asn Asn Met As - #n Ser Gly Leu Asn Lys      225                 2 - #30                 2 - #35                 2 -      #40                                                                              - - Ser Pro Pro Leu Gly Gly Ala Gln Thr Ile Se - #r Lys Asn Thr Glu        Gln                                                                                             245  - #               250  - #               255             - - Arg Pro Gln Pro Asp Pro Tyr Gln Ile Leu Gl - #y Pro Thr Ser Ser Arg                  260      - #           265      - #           270                  - - Leu Ala Asn Pro Gly Ser Gly Gln Ile Gln Le - #u Trp Gln Phe Leu Leu              275          - #       280          - #       285                      - - Glu Leu Leu Ser Asp Ser Ala Asn Ala Ser Cy - #s Ile Thr Trp Glu Gly          290              - #   295              - #   300                          - - Thr Asn Gly Glu Phe Lys Met Thr Asp Pro As - #p Glu Val Ala Arg Arg      305                 3 - #10                 3 - #15                 3 -      #20                                                                              - - Trp Gly Glu Arg Lys Ser Lys Pro Asn Met As - #n Tyr Asp Lys Leu        Ser                                                                                             325  - #               330  - #               335             - - Arg Ala Leu Arg Tyr Tyr Tyr Asp Lys Asn Il - #e Met Thr Lys Val His                  340      - #           345      - #           350                  - - Gly Lys Arg Tyr Ala Tyr Lys Phe Asp Phe Hi - #s Gly Ile Ala Gln Ala              355          - #       360          - #       365                      - - Leu Gln Pro His Pro Thr Glu Ser Ser Met Ty - #r Lys Tyr Pro Ser Asp          370              - #   375              - #   380                          - - Ile Ser Tyr Met Pro Ser Tyr His Ala His Gl - #n Gln Lys Val Asn Phe      385                 3 - #90                 3 - #95                 4 -      #00                                                                              - - Val Pro Pro His Pro Ser Ser Met Pro Val Th - #r Ser Ser Ser Phe        Phe                                                                                             405  - #               410  - #               415             - - Gly Ala Ala Ser Gln Tyr Trp Thr Ser Pro Th - #r Gly Gly Ile Tyr Pro                  420      - #           425      - #           430                  - - Asn Pro Asn Val Pro Arg His Pro Asn Thr Hi - #s Val Pro Ser His Leu              435          - #       440          - #       445                      - - Gly Ser Tyr Tyr                                                              450                                                                        - -  - - (2) INFORMATION FOR SEQ ID NO:5:                                     - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 328 base - #pairs                                                 (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..327                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:5:                               - - TCC TAC AGC CAA GCT CCA AGT CAA TAT AGC CA - #A CAG AGC AGC AGC TAC           48                                                                       Ser Tyr Ser Gln Ala Pro Ser Gln Tyr Ser Gl - #n Gln Ser Ser Ser Tyr             1               5 - #                 10 - #                 15              - - GGG CAG CAG AAC CCT TCT TAT GAC TCA GTC AG - #A AGA GGA GCT TGG GGC           96                                                                       Gly Gln Gln Asn Pro Ser Tyr Asp Ser Val Ar - #g Arg Gly Ala Trp Gly                        20     - #             25     - #             30                  - - AAT AAC ATG AAT TCT GGC CTC AAC AAA AGT CC - #T CCC CTT GGA GGG GCA          144                                                                       Asn Asn Met Asn Ser Gly Leu Asn Lys Ser Pr - #o Pro Leu Gly Gly Ala                    35         - #         40         - #         45                      - - CAA ACG ATC AGT AAG AAT ACA GAG CAA CGG CC - #C CAG CCA GAT CCG TAT          192                                                                       Gln Thr Ile Ser Lys Asn Thr Glu Gln Arg Pr - #o Gln Pro Asp Pro Tyr                50             - #     55             - #     60                          - - CAG ATC CTG GGC CCG ACC AGC AGT CGC CTA GC - #C AAC CCT GGA AGC GGG          240                                                                       Gln Ile Leu Gly Pro Thr Ser Ser Arg Leu Al - #a Asn Pro Gly Ser Gly            65                 - # 70                 - # 75                 - # 80       - - CAG ATC CAG CTG TGG CAA TTC CTC CTG GAG CT - #G CTC TCC GAC AGC GCC          288                                                                       Gln Ile Gln Leu Trp Gln Phe Leu Leu Glu Le - #u Leu Ser Asp Ser Ala                            85 - #                 90 - #                 95              - - AAC GCC AGC TGT ATC ACC TGG GAG GGG ACC AA - #C GGG GAG T                 - #   328                                                                    Asn Ala Ser Cys Ile Thr Trp Glu Gly Thr As - #n Gly Glu                                   100      - #           105                                         - -  - - (2) INFORMATION FOR SEQ ID NO:6:                                     - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 109 amino - #acids                                                (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6:                               - - Ser Tyr Ser Gln Ala Pro Ser Gln Tyr Ser Gl - #n Gln Ser Ser Ser Tyr        1               5 - #                 10 - #                 15              - - Gly Gln Gln Asn Pro Ser Tyr Asp Ser Val Ar - #g Arg Gly Ala Trp Gly                   20     - #             25     - #             30                  - - Asn Asn Met Asn Ser Gly Leu Asn Lys Ser Pr - #o Pro Leu Gly Gly Ala               35         - #         40         - #         45                      - - Gln Thr Ile Ser Lys Asn Thr Glu Gln Arg Pr - #o Gln Pro Asp Pro Tyr           50             - #     55             - #     60                          - - Gln Ile Leu Gly Pro Thr Ser Ser Arg Leu Al - #a Asn Pro Gly Ser Gly       65                 - # 70                 - # 75                 - # 80       - - Gln Ile Gln Leu Trp Gln Phe Leu Leu Glu Le - #u Leu Ser Asp Ser Ala                       85 - #                 90 - #                 95              - - Asn Ala Ser Cys Ile Thr Trp Glu Gly Thr As - #n Gly Glu                              100      - #           105                                         - -  - - (2) INFORMATION FOR SEQ ID NO:7:                                     - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 86 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7:                               - - Ile Tyr Val Gln Gly Leu Asn Asp Ser Val Th - #r Leu Asp Asp Leu Ala      1               5   - #                10  - #                15               - - Asp Phe Phe Lys Gln Cys Gly Val Val Lys Me - #t Asn Lys Arg Thr Gly                  20      - #            25      - #            30                   - - Gln Pro Met Ile His Ile Tyr Leu Asp Lys Gl - #u Thr Gly Lys Pro Lys              35          - #        40          - #        45                       - - Gly Asp Ala Thr Val Ser Tyr Glu Asp Pro Pr - #o Thr Ala Lys Ala Ala          50              - #    55              - #    60                           - - Val Glu Trp Phe Asp Gly Lys Asp Phe Gln Gl - #y Ser Lys Leu Lys Val      65                  - #70                  - #75                  - #80        - - Ser Leu Ala Arg Lys Lys                                                                  85                                                             - -  - - (2) INFORMATION FOR SEQ ID NO:8:                                     - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 86 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8:                               - - Ile Phe Val Ser Gly Met Asp Pro Ser Thr Th - #r Glu Gln Asp Ile Glu      1               5   - #                10  - #                15               - - Thr His Phe Gly Ala Ile Gly Ile Ile Lys Ly - #s Asp Lys Arg Thr Met                  20      - #            25      - #            30                   - - Lys Pro Lys Ile Trp Leu Tyr Lys Asn Lys Gl - #u Thr Gly Ala Ser Lys              35          - #        40          - #        45                       - - Gly Glu Ala Thr Val Thr Tyr Asp Asp Thr As - #n Ala Ala Gln Ser Ala          50              - #    55              - #    60                           - - Ile Glu Trp Phe Asp Gly Arg Xaa Phe Asn Gl - #y Asn Ala Ile Lys Val      65                  - #70                  - #75                  - #80        - - Ser Leu Ala Gln Arg Gln                                                                  85                                                             - -  - - (2) INFORMATION FOR SEQ ID NO:9:                                     - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 72 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9:                               - - Leu Phe Val Lys Gly Leu Ser Glu Asp Thr Th - #r Glu Glu Thr Leu Lys      1               5   - #                10  - #                15               - - Glu Ser Phe Asp Gly Ser Val Arg Ala Arg Il - #e Val Thr Asp Arg Glu                  20      - #            25      - #            30                   - - Thr Gly Ser Ser Lys Gly Phe Gly Phe Val As - #p Phe Asn Ser Glu Glu              35          - #        40          - #        45                       - - Asp Ala Lys Glu Ala Met Glu Asp Gly Glu Il - #e Asp Gly Asn Lys Val          50              - #    55              - #    60                           - - Thr Leu Asp Trp Ala Lys Pro Lys                                          65                  - #70                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:10:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 78 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:10:                              - - Leu Phe Val Ala Arg Val Asn Tyr Asp Thr Th - #r Glu Ser Lys Leu Arg      1               5   - #                10  - #                15               - - Arg Glu Phe Glu Val Tyr Gly Pro Ile Lys Ar - #g Ile His Met Val Tyr                  20      - #            25      - #            30                   - - Ser Lys Arg Ser Gly Lys Pro Arg Gly Tyr Al - #a Phe Ile Glu Tyr Glu              35          - #        40          - #        45                       - - His Glu Arg Asp Met His Ser Ala Tyr Lys Hi - #s Ala Asp Gly Lys Lys          50              - #    55              - #    60                           - - Ile Asp Gly Arg Arg Val Leu Val Asp Val Gl - #u Arg Gly Arg              65                  - #70                  - #75                               - -  - - (2) INFORMATION FOR SEQ ID NO:11:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 74 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:11:                              - - Val Tyr Ile Lys Asn Phe Gly Glu Asp Met As - #p Asp Glu Arg Leu Lys      1               5   - #                10  - #                15               - - Asp Leu Phe Gly Pro Ala Leu Ser Val Lys Va - #l Met Thr Asp Glu Ser                  20      - #            25      - #            30                   - - Gly Lys Ser Lys Gly Phe Gly Phe Val Ser Ph - #e Glu Arg His Glu Asp              35          - #        40          - #        45                       - - Ala Gln Lys Ala Val Asp Glu Met Asn Gly Ly - #s Glu Leu Asn Gly Lys          50              - #    55              - #    60                           - - Gln Ile Tyr Val Gly Arg Ala Gln Lys Lys                                  65                  - #70                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:12:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 77 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12:                              - - Ile Phe Val Gly Gly Ile Lys Glu Asp Thr Gl - #u Glu His His Leu Arg      1               5   - #                10  - #                15               - - Asp Tyr Phe Glu Gln Tyr Gly Lys Ile Glu Va - #l Ile Glu Ile Met Thr                  20      - #            25      - #            30                   - - Asp Arg Gly Ser Gly Lys Lys Arg Gly Phe Al - #a Phe Val Thr Phe Asp              35          - #        40          - #        45                       - - Asp His Asp Ser Val Asp Lys Ile Val Ile Gl - #n Lys Tyr His Thr Val          50              - #    55              - #    60                           - - Asn Gly His Asn Cys Glu Val Arg Lys Ala Le - #u Ser Lys                  65                  - #70                  - #75                               - -  - - (2) INFORMATION FOR SEQ ID NO:13:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 77 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:13:                              - - Leu Phe Val Gly Gly Ile Lys Glu Asp Thr Gl - #u Glu His His Leu Arg      1               5   - #                10  - #                15               - - Asp Tyr Phe Glu Glu Tyr Gly Lys Ile Asp Th - #r Ile Glu Ile Ile Thr                  20      - #            25      - #            30                   - - Asp Arg Gln Ser Gly Lys Lys Arg Gly Phe Gl - #y Phe Val Thr Phe Asp              35          - #        40          - #        45                       - - Asp His Asp Pro Val Asp Lys Ile Val Leu Gl - #n Lys Tyr His Thr Ile          50              - #    55              - #    60                           - - Asn Gly His Asn Ala Glu Val Arg Lys Ala Le - #u Ser Arg                  65                  - #70                  - #75                               - -  - - (2) INFORMATION FOR SEQ ID NO:14:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 67 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:14:                              - - GAGAACGAGG AGGAAGGAGA GAAAATGGCG TCCACGGGTG AGTATGGTGG AA -             #CTGCGGTC     60                                                                 - - GCGCCGG                 - #                  - #                       - #          67                                                                  - -  - - (2) INFORMATION FOR SEQ ID NO:15:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 41 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:15:                              - - ACACTATTTT TCCTCCTTGT TTTCCTCTAG ATTACAGTAC C    - #                      - #   41                                                                     - -  - - (2) INFORMATION FOR SEQ ID NO:16:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 41 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:16:                              - - GCGCAGCAGG GGTAAGTCAG TCTTTTATAA CCGTATTTTG T    - #                      - #   41                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:17:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 40 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:17:                              - - TTCAAGTTAT TGCATTTAAT TCTTTTGCAG CTACAGTGCT     - #                      - #    40                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:18:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 39 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:18:                              - - ACCACCCAGG TAATCTTTAA AATAATTACA TGTAGCTGC      - #                      - #    39                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:19:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 39 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19:                              - - GTTCTTGCAT TCGGTTTTTT TTTGGAGCAG GCATATGGG      - #                      - #    39                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:20:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 40 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:20:                              - - CCTCCCACTG GTAAGGCCTG CCTTGGAGAG ATTTTTGGGT     - #                      - #    40                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:21:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 41 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:21:                              - - GAAATCTGAT GCAGCTCCCC TTTGGTCTAG GTTATACTAC T    - #                      - #   41                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:22:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 41 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:22:                              - - GCACCTACAA GGTAAGGCCA TGGTGTCCTT AATGCGTCAG T    - #                      - #   41                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:23:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 40 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:23:                              - - TTTAATTTTA TTTATTATTT CTCCTCTTAG ACCGCAGGAT     - #                      - #    40                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:24:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 41 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:24:                              - - CCTCCTACCA GGTCAGTCTA CTTTTTGTGG CAAAACAAAA A    - #                      - #   41                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:25:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 40 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:25:                              - - TTTTTTTTTT CTCCTTCCTC TCTCTTTCAG CTATTCCTCT     - #                      - #    40                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:26:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 40 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:26:                              - - GGGCAGCAGA GTGAGTTGCT AAGAGAGAAA ACCAAATAAG     - #                      - #    40                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:27:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 41 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:27:                              - - CATGGCTTAC AGATGTGACT CTTTCCTCAG GTTCATTCCG A    - #                      - #   41                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:28:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 38 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:28:                              - - GGAATGGGGT AAGAGCAAAC CTTTTCTCCT TTTACCTA      - #                      - #     38                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:29:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 40 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:29:                              - - AAGGCCTTCA TTTCTCGTTT ATCCCCCCAG CAGCGCTGGA     - #                      - #    40                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:30:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 40 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:30:                              - - AAGCCTGGTG GTAAGTTTTT GAGTATTACC ATAGATAGTG     - #                      - #    40                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:31:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 41 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:31:                              - - ATATTTTATA TGATCTTTCC TGGTTGGCAG GACCCATGGA T    - #                      - #   41                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:32:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 40 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:32:                              - - CTTGATCTAG GTAAGTTGAA TTCCTAGTTG TGCCTTCCAT     - #                      - #    40                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:33:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 41 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:33:                              - - ATAATTCTCC TGTCTTGTTG TCTCTGAAAG GCCCACCTGT A    - #                      - #   41                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:34:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 42 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:34:                              - - GGGGTTGTTA AGGTCAGTAA AAGCATAACC AGGTCATCTG GC    - #                      - #  42                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:35:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 42 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:35:                              - - TCATGCCTAA CTATGCTATT CTTTGTCTAG ATGAACAAGA GA    - #                      - #  42                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:36:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 43 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:36:                              - - GAATGGTTTG ATGGTGAGAT GTACTCACTG GCATTCTTAA TCT    - #                      - # 43                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:37:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 41 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:37:                              - - AGTAATTGAT GTTCTGTTGT CTTGTTCCAG GGAAAGATTT T    - #                      - #   41                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:38:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 43 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:38:                              - - CCACTCCGTG GAGGTACTTT TACTGAGCTC CTATGTTGCA TTA    - #                      - # 43                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:39:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 44 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:39:                              - - ATTTGCTGTT TCTTGTTGTT CTTGTTGTAG GTCCAGGAGG CCCA   - #                      - # 44                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:40:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 38 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:40:                              - - CCCAATCCGT ATGTACTTGT CTGGGAAAAT TGATACCC      - #                      - #     38                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:41:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 43 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:41:                              - - TGATTTCTGC TGTGATGTAA TTGTATGCAG GGGTTGTGGA AAC    - #                      - # 43                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:42:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 37 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:42:                              - - CCCCCGGGTA GGTGCAGGTT TCATGAGTGT CCCCTCA      - #                       - #      37                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:43:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 41 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:43:                              - - ACTGCTTTCG CCCTGCTATT CTCACCTTAG GTGGTGATCG T    - #                      - #   41                                                                     - -  - - (2) INFORMATION FOR SEQ ID NO:44:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 38 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:44:                              - - ATGGATAAGT AAGTGCTGGT GAAAAGCAGC TGTGGGCC      - #                      - #     38                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:45:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 425 base - #pairs                                                 (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:45:                              - - TCTAACCGAA GGGCCCTCTT TACCTTGCAG AGGCGAGCAC CGTCAGGAGC GC -             #AGAGATCG     60                                                                 - - GCCCTACTAG ATGCAGAGAC CCCGCAGAGC TGCATTGACT ACCAGATTTA TT -            #TTTTAAAC    120                                                                 - - CAGAAAATGT TTTAAATTTA TAATTCCATA TTTATAATGT TGGCCACAAC AT -            #TATGATTA    180                                                                 - - TTCCTTGTCT GTACTTTAGT ATTTTTCACC ATTTGTGAAG AAACATTAAA AC -            #AAGTTAAA    240                                                                 - - TGGTAGTGTG CGGAGTTTTT TTTTCTTCCT TCTTTTAAAA ATGGTTGTTT AA -            #GACTTTAA    300                                                                 - - CAATGGGAAC CCCTTGTGAG CATGCTCAGT ATCATTGTGG AGAACCAAGA GG -            #GCCTCTTA    360                                                                 - - ACTGTAACAA TGTTCATGGT TGTGATGTTT TTTTTTTTTT TTTAAATAAA AT -            #TCCAAATG    420                                                                 - - TTTAT                 - #                  - #                  -      #           425                                                                  - -  - - (2) INFORMATION FOR SEQ ID NO:46:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 69 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:46:                              - - CTCGGCTGCA GACTTGGCCA AATGGACGGG ACTATTAAGG TAAGCGGCGG GG -            #CAACGGAC     60                                                                 - - GCGGGCGGC                - #                  - #                      - #         69                                                                  - -  - - (2) INFORMATION FOR SEQ ID NO:47:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 42 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:47:                              - - ACCCGGGGAT CCTCTAGAGT CGACCTGCAG GAGGCTCTGT CG    - #                      - #  42                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:48:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 42 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:48:                              - - GAATGGATCC AGGTAAGCTC ACCAGGCCTG TGCAGGATTG GG    - #                      - #  42                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:49:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 42 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:49:                              - - CCTTGGGCTT TTGCCCCCTC CTCACTTTAG GGAGTCTCCG GT    - #                      - #  42                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:50:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 42 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:50:                              - - TCGTCCCCGC AGGTAATTCG AGAACCAGGC TGCCTGGGCG CC    - #                      - #  42                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:51:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 42 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:51:                              - - TCCTTGCTAA CAACGTCTTC TCCTCTGCAG ACCCCACACT GT    - #                      - #  42                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:52:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 42 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:52:                              - - ACCTCAGGGA AAGTAAGTGC CGCCCAAGTA CCCAGGGCTG GG    - #                      - #  42                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:53:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 42 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:53:                              - - GTTATAACCT GTTTATGTTT TGCCTCTCAG GTTCACTGCT GG    - #                      - #  42                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:54:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 42 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:54:                              - - GTGTCAAAGA AGGTAAGTTT GTTCTTTTGT GCACTTAAAA TT    - #                      - #  42                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:55:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 42 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:55:                              - - AATGTACCCC TATTTGTTAT TGTTCATTAG ACCCTTCTTA TG    - #                      - #  42                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:56:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 42 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:56:                              - - GCCTCAACAA AAGTAAGTAA ATGTTTTATA GTTCTTTGGA GG    - #                      - #  42                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:57:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 42 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:57:                              - - CTCACTGCAT TTCTTTCCCT CTTGCCACAG GTCCTCCCCT TG    - #                      - #  42                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:58:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 42 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:58:                              - - GGCCCCAGCC AGGTACCTGC CCAGGATATG TAATCTCTCC TT    - #                      - #  42                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:59:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 42 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:59:                              - - TGAAGCAAAT TTCCTTTTTT ATTTCCTTAG ATCCGTATCA GA    - #                      - #  42                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:60:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 42 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:60:                              - - TAGCCAACCC TGGTGAGTTT ACCTTGGCCT GCAAGCCTTT TT    - #                      - #  42                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:61:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 42 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:61:                              - - TGTTCTCTCC CGTTTCCTCA CGGCGTGCAG GAAGCGGGCA GA    - #                      - #  42                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:62:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 42 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:62:                              - - AGCTACTACT AGAAGCTTAC TCATCAGTGG CCTTCTAGCT GA    - #                      - #  42                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:63:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 33 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..33                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:63:                              - - AGC TAC GGG CAG CAG AAT CCG TAT CAG ATC CT - #G                  -      #         33                                                                    Ser Tyr Gly Gln Gln Asn Pro Tyr Gln Ile Le - #u                                 1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:64:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 11 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:64:                              - - Ser Tyr Gly Gln Gln Asn Pro Tyr Gln Ile Le - #u                            1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:65:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 33 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..33                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:65:                              - - AGC TAC GGG CAG CAG AAC CCT TCT TAT GAC TC - #A                  -     #         33                                                                   Ser Tyr Gly Gln Gln Asn Pro Ser Tyr Asp Se - #r                                 1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:66:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 11 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:66:                              - - Ser Tyr Gly Gln Gln Asn Pro Ser Tyr Asp Se - #r                            1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:67:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 33 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..33                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:67:                              - - AGC TAC GGG CAG CAG AGT TCA CTG CTG GCC TA - #T                  -      #         33                                                                    Ser Tyr Gly Gln Gln Ser Ser Leu Leu Ala Ty - #r                                 1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:68:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 11 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:68:                              - - Ser Tyr Gly Gln Gln Ser Ser Leu Leu Ala Ty - #r                            1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:69:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 33 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..33                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:69:                              - - TTC AAT AAG CCT GGT GGT CCT CCC CTT GGA GG - #G                  -     #         33                                                                   Phe Asn Lys Pro Gly Gly Pro Pro Leu Gly Gl - #y                                 1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:70:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 11 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:70:                              - - Phe Asn Lys Pro Gly Gly Pro Pro Leu Gly Gl - #y                            1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:71:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 33 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..33                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:71:                              - - TTC AAT AAG CCT GGT GAC CCC ACA CTG TGG AC - #A                  -      #         33                                                                    Phe Asn Lys Pro Gly Asp Pro Thr Leu Trp Th - #r                                 1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:72:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 11 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:72:                              - - Phe Asn Lys Pro Gly Asp Pro Thr Leu Trp Th - #r                            1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:73:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 33 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..33                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:73:                              - - CCA GAT CTT GAT CTA GAT CCG TAT CAG ATC CT - #G                  -     #         33                                                                   Pro Asp Leu Asp Leu Asp Pro Tyr Gln Ile Le - #u                                 1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:74:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 11 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:74:                              - - Pro Asp Leu Asp Leu Asp Pro Tyr Gln Ile Le - #u                            1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:75:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 33 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..33                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:75:                              - - CCA GAT CTT GAT CTA GAC CCT TCT TAT GAC TC - #A                  -      #         33                                                                    Pro Asp Leu Asp Leu Asp Pro Ser Tyr Asp Se - #r                                 1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:76:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 11 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:76:                              - - Pro Asp Leu Asp Leu Asp Pro Ser Tyr Asp Se - #r                            1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:77:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 33 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..33                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:77:                              - - CCA GAT CTT GAT CTA GGT TCA CTG CTG GCC TA - #T                  -     #         33                                                                   Pro Asp Leu Asp Leu Gly Ser Leu Leu Ala Ty - #r                                 1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:78:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 11 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:78:                              - - Pro Asp Leu Asp Leu Gly Ser Leu Leu Ala Ty - #r                            1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:79:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 78 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..78                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:79:                              - - GGA CGC GGT GGA ATG GGG TTC CTG GAG CTT AG - #G ATG TGT GAT GCA GAA           48                                                                       Gly Arg Gly Gly Met Gly Phe Leu Glu Leu Ar - #g Met Cys Asp Ala Glu             1               5 - #                 10 - #                 15              - - GAA GTC TGG AAA GGT CCT CCC CTT GGA GGG  - #                  - #               78                                                                     Glu Val Trp Lys Gly Pro Pro Leu Gly Gly                                                    20     - #             25                                         - -  - - (2) INFORMATION FOR SEQ ID NO:80:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 26 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:80:                              - - Gly Arg Gly Gly Met Gly Phe Leu Glu Leu Ar - #g Met Cys Asp Ala Glu        1               5 - #                 10 - #                 15              - - Glu Val Trp Lys Gly Pro Pro Leu Gly Gly                                               20     - #             25                                         - -  - - (2) INFORMATION FOR SEQ ID NO:81:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 33 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..33                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:81:                              - - AGC TAC GGG CAG CAG AAC CCT TCT TAT GAC TC - #A                  -      #         33                                                                    Ser Tyr Gly Gln Gln Asn Pro Ser Tyr Asp Se - #r                                 1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:82:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 11 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:82:                              - - Ser Tyr Gly Gln Gln Asn Pro Ser Tyr Asp Se - #r                            1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:83:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 30 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..27                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:83:                              - - GGA CGC GGT GGA ATG GGA CCC TTC TTA TGA  - #                  - #               30                                                                    Gly Arg Gly Gly Met Gly Pro Phe Leu                                             1               5                                                            - -  - - (2) INFORMATION FOR SEQ ID NO:84:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 9 amino - #acids                                                  (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:84:                              - - Gly Arg Gly Gly Met Gly Pro Phe Leu                                        1               5                                                            - -  - - (2) INFORMATION FOR SEQ ID NO:85:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 33 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..33                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:85:                              - - TTC AAT AAG CCT GGT GGT CCT CCC CTT GGA GG - #G                  -      #         33                                                                    Phe Asn Lys Pro Gly Gly Pro Pro Leu Gly Gl - #y                                 1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:86:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 11 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:86:                              - - Phe Asn Lys Pro Gly Gly Pro Pro Leu Gly Gl - #y                            1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:87:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 51 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..48                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:87:                              - - GGA CGC GGT GGA ATG GGG TCC TCC CCT TGG AG - #G GGC ACA AAC GAT        CAG       48                                                                    Gly Arg Gly Gly Met Gly Ser Ser Pro Trp Ar - #g Gly Thr Asn Asp Gln            1               5 - #                 10 - #                 15              - - TAA                  - #                  - #                  - #                 51                                                                   - -  - - (2) INFORMATION FOR SEQ ID NO:88:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 16 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:88:                              - - Gly Arg Gly Gly Met Gly Ser Ser Pro Trp Ar - #g Gly Thr Asn Asp Gln        1               5 - #                 10 - #                 15              - -  - - (2) INFORMATION FOR SEQ ID NO:89:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 51 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..48                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:89:                              - - GGA CGC GGT GGA ATG GGG TCC TCC CCT TGG AG - #G GGC ACA AAC GAT CAG           48                                                                       Gly Arg Gly Gly Met Gly Ser Ser Pro Trp Ar - #g Gly Thr Asn Asp Gln             1               5 - #                 10 - #                 15              - - TAA                  - #                  - #                  - #                 51                                                                   - -  - - (2) INFORMATION FOR SEQ ID NO:90:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 16 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:90:                              - - Gly Arg Gly Gly Met Gly Ser Ser Pro Trp Ar - #g Gly Thr Asn Asp Gln        1               5 - #                 10 - #                 15              - -  - - (2) INFORMATION FOR SEQ ID NO:91:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 30 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..27                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:91:                              - - GGA CGC GGT GGA ATG GGG ACC CAT GGA TGA  - #                  - #               30                                                                     Gly Arg Gly Gly Met Gly Thr His Gly                                             1               5                                                            - -  - - (2) INFORMATION FOR SEQ ID NO:92:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 9 amino - #acids                                                  (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:92:                              - - Gly Arg Gly Gly Met Gly Thr His Gly                                        1               5                                                            - -  - - (2) INFORMATION FOR SEQ ID NO:93:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 33 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..33                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:93:                              - - CCA GAT CTT GAT CTA GAT CCG TAT CAG ATC CT - #G                  -      #         33                                                                    Pro Asp Leu Asp Leu Asp Pro Tyr Gln Ile Le - #u                                 1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:94:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 11 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:94:                              - - Pro Asp Leu Asp Leu Asp Pro Tyr Gln Ile Le - #u                            1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:95:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 30 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..27                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:95:                              - - GGA CGC GGT GGA ATG GGG ACC CAT GGA TGA  - #                  - #               30                                                                    Gly Arg Gly Gly Met Gly Thr His Gly                                             1               5                                                            - -  - - (2) INFORMATION FOR SEQ ID NO:96:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 9 amino - #acids                                                  (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:96:                              - - Gly Arg Gly Gly Met Gly Thr His Gly                                        1               5                                                            - -  - - (2) INFORMATION FOR SEQ ID NO:97:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 833 base - #pairs                                                 (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:97:                              - - CTGCAGAGCG CGCCCAGGCA ACCCCGAAAG GCCGGTCGGG GACCCCGGCT GG -             #GAGTCAGG     60                                                                 - - ACTCTAGCTC CCGGGCGCGA CCCGAGAACC CTGAATCCAT TCCGCGCACA CC -            #CGGCACGC    120                                                                 - - GTGACCCCTG CCGACCGGCT GGCGCGCCAC CCATTCCCCG CGGCCCGCGG AT -            #TAGTCAGC    180                                                                 - - AGTTGTTCTA GTCCGGGTCC CTTCCCCCAG CCCTCCCGCC GATCTCCGTC TC -            #CCTGCAGG    240                                                                 - - GCCGACTCTT CAGCGACCGT CCCTAGAGCC AGCGGACGGA ACCATTCCAA AC -            #AGCCTAGT    300                                                                 - - CTCGTGCTGA GAGCCTCTCC GGTTTCACGC TGAGACCCGC TCACCCCCGC TC -            #TGGCCCCT    360                                                                 - - TAGATGCTAT TTTGGCCCGA GTGTCACGTC GGGCGCTCTT TAGAGAGGAC TG -            #GGACAAGA    420                                                                 - - GTTGCGGACG CGAAGAACGA GTAAGCGGTG GTTCATCCCT CCTGACCCCA CC -            #CCCGTGGC    480                                                                 - - CTGGCCCGAT GGTCGCGCCC GGGGTTGCGA GATTTGCGCC TGCGCAGTGC GG -            #CGCCTAGA    540                                                                 - - GGGAAAGCGA GAGGGAGACG GACGTTGAGA GAACGAGGAG GAAGGAGAGA AA -            #ATGGCGTC    600                                                                 - - CACGGGTGAG TATGGTGGAA CTGCGGTCGC GCCGGCGGTA GCCGGAACGC CC -            #AAACTGGG    660                                                                 - - GGTCGTTCGT CTCTGGGCTT GGCTGGGAAG ACTGAGTGGA GTTGCCGAGA GG -            #GGGTTGAG    720                                                                 - - GCACCCGCCG CGGCCCGACG AGCTCGGGGA TCCGCATTCC TCTCCCCTCC CC -            #CAACCGGG    780                                                                 - - CGGGCCGGTT CTGGAATCTT CCCGCGCCCT CGCGCGCGGG GGGCTTTGCT TT - #T               833                                                                       - -  - - (2) INFORMATION FOR SEQ ID NO:98:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 33 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..33                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:98:                              - - AGC TAC GGG CAG CAG AGC AGT GGC CAG ATC CA - #G                  -      #         33                                                                    Ser Tyr Gly Gln Gln Ser Ser Gly Gln Ile Gl - #n                                 1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:99:                                    - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 11 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:99:                              - - Ser Tyr Gly Gln Gln Ser Ser Gly Gln Ile Gl - #n                            1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:100:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 33 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..33                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:100:                             - - AGC TAC GGG CAG CAG AAT CCT TAT CAG ATT CT - #T                  -     #         33                                                                   Ser Tyr Gly Gln Gln Asn Pro Tyr Gln Ile Le - #u                                 1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:101:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 11 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:101:                             - - Ser Tyr Gly Gln Gln Asn Pro Tyr Gln Ile Le - #u                            1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:102:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 33 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..33                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:102:                             - - AGC TAC GGG CAG CAG AAT TTA CCA TAT GAG CC - #C                  -      #         33                                                                    Ser Tyr Gly Gln Gln Asn Leu Pro Tyr Glu Pr - #o                                 1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:103:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 11 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:103:                             - - Ser Tyr Gly Gln Gln Asn Leu Pro Tyr Glu Pr - #o                            1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:104:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 33 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..33                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:104:                             - - CCA GAT CTT GAT CTA GAT TTA CCA TAT GAG CC - #C                  -     #         33                                                                   Pro Asp Leu Asp Leu Asp Leu Pro Tyr Glu Pr - #o                                 1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:105:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 11 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:105:                             - - Pro Asp Leu Asp Leu Asp Leu Pro Tyr Glu Pr - #o                            1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:106:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 954 base - #pairs                                                 (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..885                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:106:                             - - CCC ACT AGT TAC CCA CCC CAA ACT GGA TCC TA - #C AGC CAA GCT CCA AGT           48                                                                       Pro Thr Ser Tyr Pro Pro Gln Thr Gly Ser Ty - #r Ser Gln Ala Pro Ser             1               5 - #                 10 - #                 15              - - CAA TAT AGC CAA CAG AGC AGC AGC TAC GGG CA - #G CAG AGT TCA TTC CGA           96                                                                       Gln Tyr Ser Gln Gln Ser Ser Ser Tyr Gly Gl - #n Gln Ser Ser Phe Arg                        20     - #             25     - #             30                  - - CAG GAC CAC CCC AGT AGC ATG GGT GTT TAT GG - #G CAG GAG TCT GGA GGA          144                                                                       Gln Asp His Pro Ser Ser Met Gly Val Tyr Gl - #y Gln Glu Ser Gly Gly                    35         - #         40         - #         45                      - - TTT TCC GGA CCA GGA GAG AAC CGG AGC ATG AG - #T GGC CCT GAT AAC CGG          192                                                                       Phe Ser Gly Pro Gly Glu Asn Arg Ser Met Se - #r Gly Pro Asp Asn Arg                50             - #     55             - #     60                          - - GGC AGG GGA AGA GGG GGA TTT GAT CGT GGA GG - #C ATG AGC AGA GGT GGG          240                                                                       Gly Arg Gly Arg Gly Gly Phe Asp Arg Gly Gl - #y Met Ser Arg Gly Gly            65                 - # 70                 - # 75                 - # 80       - - CGG GGA GGA GGA CGC GGT GGA ATG GGA AAA AT - #T TTG AAA GAC TTA TCT          288                                                                       Arg Gly Gly Gly Arg Gly Gly Met Gly Lys Il - #e Leu Lys Asp Leu Ser                            85 - #                 90 - #                 95              - - TCT GAA GAT ACA CGG GGC AGA AAA GGA GAC GG - #A GAA AAT TCT GGA GTT          336                                                                       Ser Glu Asp Thr Arg Gly Arg Lys Gly Asp Gl - #y Glu Asn Ser Gly Val                       100      - #           105      - #           110                  - - TCT GCT GCT GTC ACT TCT ATG TCT GTT CCA AC - #T CCC ATC TAT CAG ACT          384                                                                       Ser Ala Ala Val Thr Ser Met Ser Val Pro Th - #r Pro Ile Tyr Gln Thr                   115          - #       120          - #       125                      - - AGC AGC GGA CAG TAC ATT GCC ATT GCC CCA AA - #T GGA GCC TTA CAG TTG          432                                                                       Ser Ser Gly Gln Tyr Ile Ala Ile Ala Pro As - #n Gly Ala Leu Gln Leu               130              - #   135              - #   140                          - - GCA AGT CCA GGC ACA GAT GGA GTA CAG GGA CT - #T CAG ACA TTA ACC ATG          480                                                                       Ala Ser Pro Gly Thr Asp Gly Val Gln Gly Le - #u Gln Thr Leu Thr Met           145                 1 - #50                 1 - #55                 1 -      #60                                                                              - - ACA AAT TCA GGC AGT ACT CAG CAA GGT ACA AC - #T ATT CTT CAG TAT        GCA      528                                                                    Thr Asn Ser Gly Ser Thr Gln Gln Gly Thr Th - #r Ile Leu Gln Tyr Ala                          165  - #               170  - #               175              - - CAG ACC TCT GAT GGA CAG CAG ATA CTT GTG CC - #C AGC AAT CAG GTG GTC          576                                                                       Gln Thr Ser Asp Gly Gln Gln Ile Leu Val Pr - #o Ser Asn Gln Val Val                       180      - #           185      - #           190                  - - GTA CAA ACT GCA TCA GGA GAT ATG CAA ACA TA - #T CAG ATC CGA ACT ACA          624                                                                       Val Gln Thr Ala Ser Gly Asp Met Gln Thr Ty - #r Gln Ile Arg Thr Thr                   195          - #       200          - #       205                      - - CCT TCA GCT ACT TCT CTG CCA CAA ACT GTG GT - #G ATG ACA TCT CCT GTG          672                                                                       Pro Ser Ala Thr Ser Leu Pro Gln Thr Val Va - #l Met Thr Ser Pro Val               210              - #   215              - #   220                          - - ACT CTC ACC TCT CAG ACA ACT AAG ACA GAT GA - #C CCC CAA TTG AAA AGA          720                                                                       Thr Leu Thr Ser Gln Thr Thr Lys Thr Asp As - #p Pro Gln Leu Lys Arg           225                 2 - #30                 2 - #35                 2 -      #40                                                                              - - GAA ATA AGG TTA ATG AAA AAC AGA GAA GCT GC - #T CGA GAA TGT CGC        AGA      768                                                                    Glu Ile Arg Leu Met Lys Asn Arg Glu Ala Al - #a Arg Glu Cys Arg Arg                          245  - #               250  - #               255              - - AAG AAG AAA GAA TAT GTG AAA TGC CTG GAA AA - #C CGA GTT GCA GTC CTG          816                                                                       Lys Lys Lys Glu Tyr Val Lys Cys Leu Glu As - #n Arg Val Ala Val Leu                       260      - #           265      - #           270                  - - GAA AAT CAA AAT AAA ACT CTA ATA GAA GAG TT - #A AAA ACT TTG AAG GAT          864                                                                       Glu Asn Gln Asn Lys Thr Leu Ile Glu Glu Le - #u Lys Thr Leu Lys Asp                   275          - #       280          - #       285                      - - CTT TAT TCC AAT AAA AGT GTT TGATTCCTAA GAAAGAAAA - #T ATTTTTGTGG             915                                                                       Leu Tyr Ser Asn Lys Ser Val                                                       290              - #   295                                                 - - ACATGCATAA AAATTAAATG GATTTCCTAG TGGAGTTTT      - #                      - #   954                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:107:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 295 amino - #acids                                                (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:107:                             - - Pro Thr Ser Tyr Pro Pro Gln Thr Gly Ser Ty - #r Ser Gln Ala Pro Ser        1               5 - #                 10 - #                 15              - - Gln Tyr Ser Gln Gln Ser Ser Ser Tyr Gly Gl - #n Gln Ser Ser Phe Arg                   20     - #             25     - #             30                  - - Gln Asp His Pro Ser Ser Met Gly Val Tyr Gl - #y Gln Glu Ser Gly Gly               35         - #         40         - #         45                      - - Phe Ser Gly Pro Gly Glu Asn Arg Ser Met Se - #r Gly Pro Asp Asn Arg           50             - #     55             - #     60                          - - Gly Arg Gly Arg Gly Gly Phe Asp Arg Gly Gl - #y Met Ser Arg Gly Gly       65                 - # 70                 - # 75                 - # 80       - - Arg Gly Gly Gly Arg Gly Gly Met Gly Lys Il - #e Leu Lys Asp Leu Ser                       85 - #                 90 - #                 95              - - Ser Glu Asp Thr Arg Gly Arg Lys Gly Asp Gl - #y Glu Asn Ser Gly Val                  100      - #           105      - #           110                  - - Ser Ala Ala Val Thr Ser Met Ser Val Pro Th - #r Pro Ile Tyr Gln Thr              115          - #       120          - #       125                      - - Ser Ser Gly Gln Tyr Ile Ala Ile Ala Pro As - #n Gly Ala Leu Gln Leu          130              - #   135              - #   140                          - - Ala Ser Pro Gly Thr Asp Gly Val Gln Gly Le - #u Gln Thr Leu Thr Met      145                 1 - #50                 1 - #55                 1 -      #60                                                                              - - Thr Asn Ser Gly Ser Thr Gln Gln Gly Thr Th - #r Ile Leu Gln Tyr        Ala                                                                                             165  - #               170  - #               175             - - Gln Thr Ser Asp Gly Gln Gln Ile Leu Val Pr - #o Ser Asn Gln Val Val                  180      - #           185      - #           190                  - - Val Gln Thr Ala Ser Gly Asp Met Gln Thr Ty - #r Gln Ile Arg Thr Thr              195          - #       200          - #       205                      - - Pro Ser Ala Thr Ser Leu Pro Gln Thr Val Va - #l Met Thr Ser Pro Val          210              - #   215              - #   220                          - - Thr Leu Thr Ser Gln Thr Thr Lys Thr Asp As - #p Pro Gln Leu Lys Arg      225                 2 - #30                 2 - #35                 2 -      #40                                                                              - - Glu Ile Arg Leu Met Lys Asn Arg Glu Ala Al - #a Arg Glu Cys Arg        Arg                                                                                             245  - #               250  - #               255             - - Lys Lys Lys Glu Tyr Val Lys Cys Leu Glu As - #n Arg Val Ala Val Leu                  260      - #           265      - #           270                  - - Glu Asn Gln Asn Lys Thr Leu Ile Glu Glu Le - #u Lys Thr Leu Lys Asp              275          - #       280          - #       285                      - - Leu Tyr Ser Asn Lys Ser Val                                                  290              - #   295                                                 - -  - - (2) INFORMATION FOR SEQ ID NO:108:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 27 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:108:                             - - GGACGCGGTG GAATGGGCAG CGCTGGA          - #                  - #                 27                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:109:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 21 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..21                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:109:                             - - AAG CCT GGT GGA CCC ATG GAT       - #                  - #                      - #21                                                                  Lys Pro Gly Gly Pro Met Asp                                                     1               5                                                            - -  - - (2) INFORMATION FOR SEQ ID NO:110:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 7 amino - #acids                                                  (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:110:                             - - Lys Pro Gly Gly Pro Met Asp                                                1               5                                                            - -  - - (2) INFORMATION FOR SEQ ID NO:111:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 30 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..30                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:111:                             - - GGA CGC GGT GGA ATG GGA AAA ATT TTG AAA  - #                  - #               30                                                                     Gly Arg Gly Gly Met Gly Lys Ile Leu Lys                                         1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:112:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 10 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:112:                             - - Gly Arg Gly Gly Met Gly Lys Ile Leu Lys                                    1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:113:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 30 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..30                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:113:                             - - CGG CGC CCA TCT TAC AGA AAA ATT TTG AAA  - #                  - #               30                                                                     Arg Arg Pro Ser Tyr Arg Lys Ile Leu Lys                                         1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:114:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 10 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:114:                             - - Arg Arg Pro Ser Tyr Arg Lys Ile Leu Lys                                    1               5 - #                 10                                     - -  - - (2) INFORMATION FOR SEQ ID NO:115:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 30 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                             (B) LOCATION: 1..27                                                  - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:115:                             - - CGG CGC CCA TCT TAC AGG ACC CAT GGA TGA  - #                  - #               30                                                                     Arg Arg Pro Ser Tyr Arg Thr His Gly                                             1               5                                                            - -  - - (2) INFORMATION FOR SEQ ID NO:116:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 9 amino - #acids                                                  (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:116:                             - - Arg Arg Pro Ser Tyr Arg Thr His Gly                                        1               5                                                            - -  - - (2) INFORMATION FOR SEQ ID NO:117:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 20 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:117:                             - - AGAAGGGTAC TTGTACATGG            - #                  - #                      - # 20                                                                   - -  - - (2) INFORMATION FOR SEQ ID NO:118:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 65 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:118:                             - - ATCGTTGAGA CTCGTACCAG CAGAGTCACG AGAGAGACTA CACGGTACTG GT -             #TTTTTTTT     60                                                                 - - TTTTT                 - #                  - #                  -      #            65                                                                  - -  - - (2) INFORMATION FOR SEQ ID NO:119:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 20 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:119:                             - - CCCACTAGTT ACCCACCCCA            - #                  - #                      - # 20                                                                  - -  - - (2) INFORMATION FOR SEQ ID NO:120:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 21 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:120:                             - - CGTTGAGACT CGTACCAGCA G           - #                  - #                      - #21                                                                   - -  - - (2) INFORMATION FOR SEQ ID NO:121:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 22 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:121:                             - - TCCTACAGCC AAGCTCCAAG TC           - #                  - #                     22                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:122:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 22 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:122:                             - - TACCAGCAGA GTCACGAGAG AG           - #                  - #                     22                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:123:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 22 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:123:                             - - AACAGAGCAG CAGCTACGGG CA           - #                  - #                     22                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:124:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 23 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:124:                             - - CGAGAGAGAC TACACGGTAC TGG           - #                  - #                    23                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:125:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 24 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:125:                             - - AAAACTCCAC TAGGAAATCC ATTT          - #                  - #                    24                                                                      - -  - - (2) INFORMATION FOR SEQ ID NO:126:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 20 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:126:                             - - CTGGGAGGGG GGAGTGGAAG            - #                  - #                      - # 20                                                                   - -  - - (2) INFORMATION FOR SEQ ID NO:127:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 20 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:127:                             - - GGGCCGATCT CTGCGCTCCT            - #                  - #                      - # 20                                                                   - -  - - (2) INFORMATION FOR SEQ ID NO:128:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 33 base - #pairs                                                  (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: double                                                      (D) TOPOLOGY: linear                                                 - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:128:                             - - CCA GAT CTT GAT CTA GGT TCA CTG CTG GCC TA - #T                  -      #         33                                                                    Pro Asp Leu Asp Leu Gly Ser Leu Leu Ala Ty - #r                                 1               5                                                            - -  - - (2) INFORMATION FOR SEQ ID NO:129:                                   - -      (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 11 amino - #acids                                                 (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                 - -     (ii) MOLECULE TYPE: protein                                           - -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:129:                             - - Pro Asp Leu Asp Leu Gly Ser Leu Leu Ala Ty - #r                            1               5                                                         __________________________________________________________________________

We claim:
 1. An isolated DNA sequence comprising nucleotides 1 to 2372in Sequence ID No. 1, which DNA sequence is the Ews gene of chromosome22.
 2. An isolated RNA sequence which is complementary to the DNAsequence shown in Sequence ID No.
 1. 3. An isolated DNA sequencecomprising nucleotides 1 to 2796 in Sequence ID No. 3, which DNA is theHum-Fli-1 gene of chromosome 11, and which gene translocates with theEws gene of chromosome 22, and which translocation is associated withthe development of Ewing's Sarcoma.
 4. A hybrid DNA sequence comprisinga portion of the Ews gene as shown in Sequence ID No. 1 from its 5' endup to and including exon 7 of the Ews gene linked to a portion of asecond gene from its 3' end up to and including exon 9 of the secondgene, which second gene is selected from the group consisting of theHum-Fli-1 gene of chromosome 11, the Erg gene of chromosome 21, and theAtf-1 gene of chromosome
 12. 5. The hybrid DNA sequence of claim 4wherein the portion of the Ews gene comprises the nucleotide sequence ofthe Ews gene from its 5' end up to the EWSR1 region of the Ews gene at atranslocation breakpoint.
 6. The hybrid DNA sequence of claim 4 whichhas a breakpoint at about nucleotide 820 of Sequence ID No.
 1. 7. Thehybrid DNA sequence of claim 4 which has a breakpoint at aboutnucleotide 1072 of Sequence ID No.
 1. 8. The hybrid DNA sequence ofclaim 4 which comprises up to and including exon 8 of the second gene.9. The hybrid DNA sequence of claim 8 wherein the second gene is theHum-Fli-1 gene.
 10. The hybrid DNA sequence of claim 9 wherein theportion of the Hum-Fli-1 gene comprises the nucleotide sequence of theHum-Fli-1 gene from its 3' end to a EWSR2 region at a translocationbreakpoint.
 11. The hybrid DNA sequence of claim 9 wherein the Hum-Fli-1gene is under the control of an ectopic promoter, which is the Ewspromoter.
 12. The hybrid DNA sequence of claim 8 wherein the second geneis the Erg gene.
 13. The hybrid DNA sequence of claim 8 wherein thesecond gene is the Atf-1 gene.
 14. The hybrid DNA sequence of claim 4which comprises a foreign DNA sequence between the Ews gene and thesecond gene.
 15. An isolated polypeptide as shown as amino acids 1 to656 in Sequence ID No. 2, which polypeptide is encoded by the Ews geneof chromosome
 22. 16. The isolated polypeptide of claim 15 wherein about46% of the total amino acids from number 300 to 340, 454 to 513, andnumber 559 to 640 of Sequence ID No. 2 are glycine.
 17. The isolatedpolypeptide of claim 16 wherein about 19% of the total amino acids fromnumber 300 to 340, 454 to 513, and number 559 to 640 of Sequence ID No.2 are arginine.
 18. An isolated polypeptide comprising amino acids 1 to452 in Sequence ID No. 4, which polypeptide is encoded by the Hum-Fli-1gene of chromosome 11, and which gene translocates with the Ews gene ofchromosome 22, and which translocation is implicated with the appearanceof Ewing's Sarcoma.
 19. A fusion protein comprising a portion of the Ewspolypeptide from the amino terminal to amino acid 265, as shown inSequence ID No. 2 and a portion comprising the carboxy terminal of asecond polypeptide selected from the group consisting of polypeptidesencoded by the Hum-Fli-1 gene of chromosome 11, the Erg gene ofchromosome 21, and the Atf-1 gene of chromosome 12, wherein the carboxyterminal of the second polypeptide is encoded by the second gene portionof claim
 4. 20. The fusion protein of claim 19 wherein the secondpolypeptide is Hum-Fli-1.
 21. The fusion protein of claim 19 wherein thesecond polypeptide is Erg.
 22. The fusion protein of claim 19 whereinthe second polypeptide is Atf-1.
 23. The fusion protein which is encodedby the hybrid DNA sequence of claim
 4. 24. The fusion protein of claim19 which comprises the portion of the Ews polypeptide from the aminoterminal to amino acid 349, as shown in Sequence ID No.
 2. 25. Acomplementary nucleic acid sequence for hybridizing to the hybrid DNAsequence of claim 4 or to an mRNA sequence transcribed by the hybrid DNAsequence.
 26. A kit for determining the presence in a human cell of atranslocation involving chromosome 22, which kit comprises:a probe whichhybridize to a hybrid DNA sequence comprising a portion of the Ews geneand a portion of a gene selected from the group consisting of Erg andAtf-1, or mRNA transcribed from the hybrid DNA sequence, wherein theprobe hybridizes to both the Ews portion and the Erg or Atf-1 portion ofthe DNA sequence or the mRNA, and control specimens of DNA or RNA.
 27. Amethod for determining the presence in a patient of a translocation ofthe Ews gene of chromosome 22 and a gene selected from the groupcomprising the Erg gene of chromosome 21 and the Atf-1 gene ofchromosome 12, which method comprisestreating a biological specimen fromthe patient to render the nucleic acids in the specimen accessible to anucleic acid probe, contacting the specimen with a probe whichhybridizes to the hybrid DNA sequence of claim 4 or to an mRNA sequencetranscribed by the hybrid DNA sequence, wherein the probe hybridizes toboth the Ews portion and the Hum-Fli-1, Erg or Atf-1 portion of the DNAsequence or the mRNA, and detecting the hybrids of the hybrid DNAsequence or the mRNA and the probe, the presence of which is diagnosticof the translocation.
 28. A method for diagnosing Ewing's Sarcoma in apatient, which method comprises:treating a biological specimen from thepatient to render the nucleic acids in the specimen accessible to anucleic acid probe, contacting the specimen with a probe whichhybridizes to the hybrid DNA sequence of claim 4 or to an mRNA sequencetranscribed by the hybrid DNA sequence, wherein the probe hybridizes toboth the Ews portion and the Hum-Fli-1, Erq or Atf-1 portion of the DNAsequence or the mRNA, detecting the hybrids of the DNA or mRNA sequenceand the probe, the presence of which is diagnostic of Ewing's sarcoma.29. A kit for determining the presence in a human cell of atranslocation involving chromosome 22, which kit comprises a reversetranscriptase and PCR primers which amplify cDNA comprising a portion ofthe Ews gene and a portion of a gene selected from the group consistingof Erg and Atf-1, wherein the portions of the Ews, Erg, and Atf-1 genesare as defined in claim 4, and control specimens of DNA or RNA.
 30. Thehybrid DNA of claim 12, which comprises a junction sequence between theEws and Erg genes selected from the group consisting of the sequencesshown in Sequence ID Nos. 98, 100, 102, and
 104. 31. An isolated fusionprotein resulting from a chromosomal translation t(21;22), present intumor cells, wherein the fusion protein comprises the sequence of aminoacids encoded by the hybrid DNA of claim
 30. 32. The protein of claim31, which comprises the N-terminal of the EWS protein and the C-terminalof the ERG protein.
 33. A method for diagnosing Ewing's sarcoma,comprising the following steps:extracting mRNA from a biologicalspecimen from tumor cells of a patient that are likely to have achromosomal translocation t(21;22); synthesizing a cDNA complementary tosaid mRNA by reverse transcription of the mRNA; amplifying the cDNA;analyzing the amplified products; and detecting the presence of aproduct of the fusion gene resulting from the translocation t(21;22),thereby diagnosing Ewing's sarcoma.
 34. The hybrid DNA of claim 13,which comprises the part of the nucleotide sequence of the Ews gene upto the EWSR1 region at the level of which the breakpoint of chromosome22 is located, and the part of the nucleotide sequence of the Atf-1 genefrom the region at the level of which the breakpoint of chromosome 12 islocated in this translocation, to its 3' end.
 35. The hybrid DNA ofclaim 13 which comprises the sequence shown in Sequence ID No.
 106. 36.The hybrid DNA of claim 35 which comprises a junction sequence betweenthe Ews and Atf-1 genes selected from the group consisting of sequencesshown in Sequence ID Nos. 111, 113, and
 115. 37. An isolated fusionprotein resulting from a chromosomal translocation t(12;22), present intumor cells, which comprises the sequence of amino acids coded by thehybrid DNA of claim
 35. 38. The protein of claim 37 which comprises theN-terminal of the EWS protein and the C-terminal of the Atf-1 protein.39. The hybrid DNA of claim 37, wherein its sequence of amino acidcomprises the sequence shown in Sequence ID No.
 106. 40. A method fordiagnosing soft tissue malignant melanoma, comprising the followingsteps:extracting mRNA from a biological specimen from tumor cells of apatient that are likely to have a chromosomal translocation t(12;22);synthesizing a cDNA complementary to the mRNA by reverse transcriptionof the mRNA; amplifying the cDNA; analyzing the amplified products; anddetecting the presence of a product of the fusion gene resulting fromthe translocation t(12;22), thereby diagnosing soft tissue malignantmelanoma.